from:"Michael"

[OMPI users] Correction to FAQ: How do I build BLACS with Open MPI?

2007-07-12 Thread Michael

In the FAQ <http://www.open-mpi.org/faq/?category=mpi-apps>, section  
labeled:


12. How do I build BLACS with Open MPI?

INTFACE = -Df77IsF2C

That INTFACE value is only for G77, G95, and related compilers.

For the Intel Fortran compiler it is: -DAdd_


I have successfully built the combination of OpenMPI 1.2.3, ATLAS,  
BLACS, ScalaPack, and MUMPS using the Intel Fortran compiler on two  
different Debian Linux systems (3.0r3 on AMD Opterons and 4.0r0 on  
Intel Woodcrest/MacPro).


Michael

Re: [OMPI users] Correction to FAQ: How do I build BLACS with Open MPI?

2007-07-12 Thread Michael


On Jul 12, 2007, at 4:42 PM, George Bosilca wrote:


The INTFACE is for the namespace interface in order to allow the
Fortran code to call a C function. So it should be dependent on the
compiler. Btw, for some reasons I was quite sure we generate all 4
versions of the Fortran interface ... If this is true is doesn't
really mater what you have in the INTFACE.


It would except this flag not only affects the names BLACS uses to  
link to OpenMPI but also what interfaces it generates (based on my  
experience), which then for example affects what happens when you  
build ScalaPack.  I believe that was what I was seeing when building  
those three with the Intel compiler and g95, the latter was harder  
then expected.



The option Jeff is refering to is the TRANSCOMM define. It allow
BLACS to know how to convert between Fortran and C handlers. For Open
MPI this should be set to -DUseMpi2.


Fortunately documented on the web FAQ but not in the BLACS  
documentation.


Michael


   Thanks,
 george.

On Jul 12, 2007, at 2:41 PM, Jeff Squyres wrote:


On Jul 12, 2007, at 2:28 PM, Michael wrote:


In the FAQ <http://www.open-mpi.org/faq/?category=mpi-apps>, section
labeled:

12. How do I build BLACS with Open MPI?

INTFACE = -Df77IsF2C

That INTFACE value is only for G77, G95, and related compilers.



For the Intel Fortran compiler it is: -DAdd_


Really?  I always thought that this flag discussed how to convert F77
MPI handles to C handles (some MPI implementations use integers for
MPI handles in C, so there's no conversion necessary, but LAM and
Open MPI use pointers, so using the MPI_*_f2c() functions are
necessary).  Hence, it's not specific to a given fortran compiler.

But I could be completely misunderstanding this value...

UTK: can you confirm/deny both of these values?  (I do not claim to
be a BLACS expert...)


I have successfully built the combination of OpenMPI 1.2.3, ATLAS,
BLACS, ScalaPack, and MUMPS using the Intel Fortran compiler on two
different Debian Linux systems (3.0r3 on AMD Opterons and 4.0r0 on
Intel Woodcrest/MacPro).

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] mpi_file_set_view

2007-09-18 Thread Michael


On Sep 17, 2007, at 7:55 PM, Jeff Squyres wrote:


Are you using the MPI F90 bindings perchance?

If so, the issue could be that the prototype for MPI_FILE_SET_VIEW is:

interface MPI_File_set_view

subroutine MPI_File_set_view(fh, disp, etype, filetype, datarep, &
 info, ierr)
   include 'mpif-config.h'
   integer, intent(in) :: fh
   integer(kind=MPI_OFFSET_KIND), intent(in) :: disp
   integer, intent(in) :: etype
   integer, intent(in) :: filetype
   character(len=*), intent(in) :: datarep
   integer, intent(in) :: info
   integer, intent(out) :: ierr
end subroutine MPI_File_set_view

end interface

and you might need a variable to be explicitly typed "integer
(kind=MPI_OFFSET_KIND)" -...

On Sep 17, 2007, at 12:40 PM, Andrus, Mr. Brian (Contractor) wrote:


I have run into something that I don't quite understand. I have
some code that is meant to open a file for reading, but at compile
time I get "Could not resolve generic procedure mpi_file_set_view"


Jeff is precisely correct.  In Fortran 90 if you get a message of  
this type from the compiler it means that the variable types don't  
line up between subroutine/function and the calling code.  The only  
promotion in Fortran 90 is inline, i.e. x = i * y.  Fortran 90 is a  
strongly typed language if you use interfaces.  Unfortunately I have  
yet to see a Fortran 90 compiler that gives a obvious error message  
pointing to the specific error for these interfacing errors.


Michael

Re: [OMPI users] C and Fortran 77 compilers are not link compatible. Can not continue.

2007-09-20 Thread Michael


On Sep 20, 2007, at 7:49 AM, Tim Prins wrote:

This is because Open MPI is finding gcc for the C compiler and  
ifort for

the Fortran compiler.


Just to be clear:  it is possible to build OpenMPI using ifort for  
Fortran and gcc for the C compiler on at least Linux.  I have done  
that on several Linux systems for many releases of OpenMPI, but have  
not tried on OS X.  On OS X I have been using g95.


For reference below is my build commands for Linux with ifort:

./configure F77=ifort FC=ifort --with-mpi-f90-size=small ; make all


For reference below is my build commands OS X with g95:

./configure F77=g95 FC=g95 LDFLAGS=-lSystemStubs --with-mpi-f90- 
size=small ; make all



I'm not aware if special flags are needed with ifort on OS X, but - 
ISystemStubs is required for g95 and might be for ifort as well on OS X.



Michael

Re: [OMPI users] which alternative to OpenMPI should I choose?

2007-10-19 Thread Michael


On Oct 19, 2007, at 9:29 AM, Marcin Skoczylas wrote:


Jeff Squyres wrote:

On Oct 18, 2007, at 9:24 AM, Marcin Skoczylas wrote:



/I assume this could be because of:

$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref
Use
Iface
192.125.17.0*   255.255.255.0   U 0
00 eth1
192.168.12.0*   255.255.255.0   U 0
00 eth1
161.254.0.0 *   255.255.0.0 U 0
00 eth1
default 192.125.17.10.0.0.0 UG0
00 eth1



Actually the configuration here is quite strange, this is not a  
private

address. The head node sits on a public address from 192.125.17.0 net
(routable from outside), workers are on 192.168.12.0


I have an almost similar configuration that works just fine with  
OpenMPI, in my case the head node has three interfaces and the worker  
nodes each have two interfaces, the configuration is roughly:


master: eth0: 192.168.x.x, eth1 & eth2 bonded to 10.0.0.1
node2: eth0 & eth1 bonded to 10.0.0.2
nodeN: eth0 & eth1 bonded to 10.0.0.N

So our "outside" communication with the head node is on the 192.168  
network and the internal communication is on the 10.0.0.x network.


In your case the "outside" communication is on the the 192.125  
network and the internal communication is on the 192.168 network.


The primary difference seems to be that you have all communication  
going over a single interface.


I'm a little surprised there is any problem at all with OpenMPI &  
your configuration as my configuration is more complicated.


Michael

Re: [OMPI users] OpenMP + OpenMPI

2007-12-06 Thread Michael



On Dec 5, 2007, at 9:57 PM, Tee Wen Kai wrote:

I have installed openmpi-1.2.3. My system has two ethernet ports.  
Thus, I am trying to make use of both ports to speed up the  
communication process by using openmp to split into two threads.


Why not use Ethernet Bonding at the system level, it's a lot easier  
then what it sounds like you are trying to do and it speeds up all  
ethernet traffic on the computer.  What OS are you trying to do this on.


Michael

[OMPI users] Dual ethernet & OpenMPI

2007-12-12 Thread Michael

In the past I configured a Linux cluster by bonding two ethernet ports  
together on each node (with the master having a third port of outside  
communication); however, recent discussions seem to say that if I have  
two ethernet cards OpenMPI can handle all the setup itself.


My question is what address ranges should I use, that is, should both  
ports be on the same network range, i.e. 10.0.0.x/255.255.255.0, or  
should they be on separate network ranges, i.e. 10.0.0.x/255.255.255.0  
and 10.0.1.x/255.255.255.0.


Would I need a third ethernet card for outside communication or could  
one port on the master node handle both internal and external  
communications.


Would there be any special flags to set this up or would OpenMPI  
detect the two paths -- obviously each port would have a different IP  
address if I'm not using bonding so do you just double the host list?


How would I test if I have doubled my bandwidth?

Michael

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Michael



On Mar 6, 2008, at 12:49 PM, Doug Reeder wrote:

Greg,

I would disagree with your statement that the available fortran  
options can't pass a cost-benefit analysis. I have found that for  
scientific programming (e.g., Livermore Fortran Kernels and actual  
PDE solvers) that code produced by the intel compiler runs 25 to 55%  
faster than code from gfortran or g95. Looking at the cost of adding  
processors with g95/gfortran to get the same throughput as with  
ifort you recover the $549 compiler cost real quickly.


Doug Reeder



I've a big fan of g95, but actually I'm seeing even greater  
differences in a small code I'm using for some lengthy calculations.


With 14 MB of data being read into memory and processed:

Intel ifort  is 7.7x faster then Linux g95 on MacPro 3.0 GHz
Intel ifort  is 2.9x faster then Linux g95 on Dual Opteron 1.4 GHz
Intel ifort  is 1.8x faster then Linux g95 on SGI Altix 350 dual  
Itanium2 1.4 GHz
OS X g95 is 2.7x faster then Linux g95 on a MacPro 2.66 GHz (same  
hardware exactly)


The complete data set is very large, 56 GB, but that is 42 individual  
frequencies, where as the 14 MB is a single frequency, data averaged  
over areas, so get a favor of the answer but not exactly the right  
answer.  I played around with compiler options, specified the exact  
processor type within the limits of gcc and I gained only factions of  
a percent.


A co-worker saw factor 2 differences between Intel's compiler and g95  
with a very complicated code.


Michael

Re: [OMPI users] MPI-2 Supported on Open MPI 1.2.5?

2008-03-10 Thread Michael

Quick answer, till you get a complete answer, Yes, OpenMPI has long  
supported most of the MPI-2 features.


Michael

On Mar 7, 2008, at 7:44 AM, Jeff Pummill wrote:

Just a quick question...

Does Open MPI 1.2.5 support most or all of the MPI-2 directives and  
features?


I have a user who specified MVAPICH2 as he needs some features like  
extra task spawning, but I am trying to standardize on Open MPI  
compiled against Infiniband for my primary software stack.


Thanks!

--
Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas

Re: [OMPI users] configure:25579: error: No atomic primitives available for ppc74xx-linux-gnu

2008-04-09 Thread Michael


On Apr 9, 2008, at 1:57 PM, Bailey, Eric wrote:

I am trying to use a cross compiler to build Open MPI for an embedded
ppc7448 running Linux 2.6 but during configure I get the following
error.
configure:25579: error: No atomic primitives available for
ppc74xx-linux-gnu

Does anyone have an idea as to how to get past this error?
...
The configure is complaining about the missing atomic directives for
your processor. We have the MIPS atomic calls but not the MIPS64. We
just have to add them in the opal/asm/base.


based on my reading PPC 7448 is basically the same processor in my  
Apple PowerMac G4 <http://en.wikipedia.org/wiki/PowerPC_G4>.   
Therefore, OpenMPI should have no trouble as I build OpenMPI on my G4  
many times.


I have no idea where the MIPS references come from.  PPC has always  
meant PowerPC in everything I have seen.  All MIPS chips I'm aware of   
are labeled R.


It might be best to get a PowerMac G4 and build OpenMPI on it, but  
you'd probably have better luck if you install Linux on the G4 instead  
of building OpenMPI on OS X as your final platform is Linux.


Michael

[OMPI users] Problem compiling open MPI on cygwin on windows

2008-04-23 Thread Michael


Hi,

New to open MPI, but have used MPI before.

I am trying to compile open MPI on cygwin on widows XP.  From what I 
have read this should work?


Initially I hit  a problem with the 1.2.6  standard download in that  
time related header file was incorrect and the mailing list pointed me 
to  the trunk  build to solve that problem.


Now when I try to compile I am  getting the following error at the 
bottom of this mail.


My question is am I wasting my time trying to use cygwin, or are there 
people out there using it on cygwin.  If so, is there a solution to the 
problem below?


Thanks in Advance,
Michael.

   mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include -I../../../../op
al/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -D_REENTRANT -O3 
-DNDEBUG -finline-functions -fno-strict-aliasing -MT paffinity_window
s_module.lo -MD -MP -MF .deps/paffinity_windows_module.Tpo -c 
paffinity_windows_module.c  -DDLL_EXPORT -DPIC -o 
.libs/paffinity_windows_modu

le.o
paffinity_windows_module.c:41: error: parse error before "sys_info"
paffinity_windows_module.c:41: warning: data definition has no type or 
storage class

paffinity_windows_module.c: In function `windows_module_get_num_procs':
paffinity_windows_module.c:90: error: request for member 
`dwNumberOfProcessors' in something not a structure or union

paffinity_windows_module.c: In function `windows_module_set':
paffinity_windows_module.c:96: error: `HANDLE' undeclared (first use in 
this function)
paffinity_windows_module.c:96: error: (Each undeclared identifier is 
reported only once

paffinity_windows_module.c:96: error: for each function it appears in.)
paffinity_windows_module.c:96: error: parse error before "threadid"
paffinity_windows_module.c:97: error: `DWORD_PTR' undeclared (first use 
in this function)
paffinity_windows_module.c:99: error: `threadid' undeclared (first use 
in this function)
paffinity_windows_module.c:99: error: `process_mask' undeclared (first 
use in this function)
paffinity_windows_module.c:99: error: `system_mask' undeclared (first 
use in this function)

paffinity_windows_module.c: In function `windows_module_get':
paffinity_windows_module.c:116: error: `HANDLE' undeclared (first use in 
this function)

paffinity_windows_module.c:116: error: parse error before "threadid"
paffinity_windows_module.c:117: error: `DWORD_PTR' undeclared (first use 
in this function)
paffinity_windows_module.c:119: error: `threadid' undeclared (first use 
in this function)
paffinity_windows_module.c:119: error: `process_mask' undeclared (first 
use in this function)
paffinity_windows_module.c:119: error: `system_mask' undeclared (first 
use in this function)

make[2]: *** [paffinity_windows_module.lo] Error 1
make[2]: Leaving directory 
`/home/Michael/mpi/openmpi-1.3a1r18208/opal/mca/paffinity/windows'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Michael/mpi/openmpi-1.3a1r18208/opal'
make: *** [all-recursive] Error 1

Re: [OMPI users] install intel mac with Laopard

2008-04-25 Thread Michael



On Apr 24, 2008, at 10:48 AM, Jeff Squyres (jsquyres) wrote:


You probably want to use all the intel compilers, not just ifort.

CC=icc
CXX=icpc
FC=ifort
F77=ifort


I have long used gcc and ifort on 64-bit Linux (since SUSE did 64-bit  
Linux on AMD Opterons), I don't see a good reason why OS X should be  
be/remain a problem.


There are very major reasons to use gcc and ifort.  It works and  
buying a bunch of ifort C licenses is an unnecessary expense, unless  
there is some type of speed advantage to compiling OpenMPI with the  
Intel compilers instead of gcc.


Given that we have very little C code that requires a highly optimized  
compiler for the latest Intel Xeon we will continue to use gcc when  
building OpenMPI.  However, given that ifort generates code that is  
7.7 times faster under Linux than that generated by latest g95 and  
gfortran in my tests on a 5300-series Xeon (first dual quad 3.0 GHz  
Mac Pro) we will also continue using ifort for the foreseeable future.


Strangely g95 is 2.7 times faster under OS X (with virus-checking)  
than under g95 under Linux on a 5100-series Xeon (first quad 2.66 GHz  
Mac Pro).  This is something I will test again with Intel's OS X  
Fortran compiler against their Linux compiler to see if there is a  
difference there.


Michael

Re: [OMPI users] Memory question and possible bug in 64bit addressing under Leopard!

2008-04-25 Thread Michael



On Apr 25, 2008, at 4:10 PM, Brian Barrett wrote:


On Apr 25, 2008, at 2:06 PM, Gregory John Orris wrote:


produces a core dump on a machine with 12Gb of RAM.

and the error message

mpiexec noticed that job rank 0 with PID 75545 on node mymachine.com
exited on signal 4 (Illegal instruction).

However, substituting in

float *X = new float[n];
for
float X[n];

Succeeds!



You're running off the end of the stack, because of the large amount
of data you're trying to put there.  OS X by default has a tiny stack
size, so codes that run on Linux (which defaults to a much larger
stack size) sometimes show this problem.  Your best bets are either to
increase the max stack size or (more portably) just allocate
everything on the heap with malloc/new.


Where are Fortran 90 arrays allocated, stack or heap?  I can't see us  
using malloc in our Fortran 90 codes, I need to understand this before  
I start configuring a new clusters, I was planning for it to run OS X  
instead of Linux.  At the moment I don't have an OS X system with  
enough RAM to test this.


Michael

Re: [OMPI users] Error: SAVE statement at (1) follows blanket SAVE statement in file mpif.h

2006-11-20 Thread Michael


The problem discussed here is with MPICH2 version of MPI not OpenMPI.


Michael

On Nov 18, 2006, at 9:22 AM, Jeff Squyres wrote:


We do not appear to have the token "save" anywhere in our mpif.h file.

Can you send a copy of the mpif.h file that your compiler is finding
(and ensure that it belongs to Open MPI)?  Also please send the
information regarding compilation problems listed on the "Getting
help" page on the web site.

Thanks!


On Nov 16, 2006, at 4:11 PM, Yu Chen wrote:


Hello,

Not sure if it's openmpi related or the program I am installing.

I installed openmpi using g95 as F77 and F90 compiler with flags
"-ffixed-line-length-132  -fno-underscoring" on PowerMac G5 with OS X
10.4 without any problems.

Then I tried to compile this program (CYANA if it matters, it's a
molucular caculation program) with openmpi generated mpif90 wrapper
with
same flags, then it gave out the following errors, wondering if
someone
has idea about this, googled it without much help. Thanks a lot in
advance.

.
.
../../etc/prepare -c -Dg95 -Dmpi -Dapple -Dapple_ompig95-withflags -w
inclan.for > inclan.f
/sw/mpich2_g95_withflags/bin/mpif90 -c -ffixed-line-length-132
-fno-underscoring inclan.f
In file mpif.h:420

 Included at inclan.f:26

SAVE /MPIPRIV1/,/MPIPRIV2/
   1
Error: SAVE statement at (1) follows blanket SAVE statement
In file mpif.h:423

 Included at inclan.f:26

SAVE   /MPIPRIVC/
   1
Error: SAVE statement at (1) follows blanket SAVE statement
make[2]: *** [inclan.o] Error 1
.
.





===
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone:  (410)455-6347 (primary)
(410)455-2718 (secondary)
fax:(410)455-1174
email:  c...@hhmi.umbc.edu
===
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] install script issue

2007-02-07 Thread Michael

Building openmpi-1.3a1r13525 on OS X 10.4.8 (PowerPC), using my  
standard compile line


./configure F77=g95 FC=g95 LDFLAGS=-lSystemStubs --with-mpi-f90- 
size=large --with-f90-max-array-dim=3 ; make all


and after installing I found that I couldn't compile, because of the  
following:


-rw--- 1 root  wheel   640216 Feb  7 14:48 libmpi_f90.a

This has not happened in the past and I followed the same procedures  
I've been using for many months.


One slight difference is that I installed using the command "make  
install all" rather then "make install", also I had uninstalled the  
previous version prior to installing this version.


Michael

[OMPI users] Fortran90 interfaces--problem?

2007-03-05 Thread Michael

I have discovered a problem with the Fortran90 interfaces for all  
types of communication when one uses derived datatypes (I'm currently  
using openmpi-1.3a1r13918 [for testing] and openmpi-1.1.2 [for  
compatibility with an HPC system]), for example


call MPI_RECV(tsk,1,MPI_TASKSTATE,src, 
1,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier)


where tsk is a Fortran 90 structure and MPI_TASKSTATE has been  
created by MPI_TYPE_CREATE_STRUCT.


At the moment I can't imagine a way to modify the OpenMPI interface  
generation to work around this besides switching to --with-mpi-f90- 
size=small.


Michael

[OMPI users] MPI_PACK very slow?

2007-03-06 Thread Michael

I have a section of code were I need to send 8 separate integers via  
BCAST.


Initially I was just putting the 8 integers into an array and then  
sending that array.


I just tried using MPI_PACK on those 8 integers and I'm seeing a  
massive slow down in the code, I have a lot of other communication  
and this section is being used only 5 times.  I went from 140 seconds  
to 277 seconds on 16 processors using TCP via a dual gigabit ethernet  
setup (I'm the only user working on this system today).


This was run with OpenMPI 1.1.2 to maintain compatibility with a  
major HPC site.


Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

Re: [OMPI users] MPI_PACK very slow?

2007-03-06 Thread Michael

I discovered I made a minor change that cost me dearly (I had thought  
I had tested this single change but perhaps didn't track the timing  
data closely).


MPI_Type_creat_struct performs well only when all the data is  
continuous in memory (at least for OpenMPI  1.1.2).


Is this normal or expected?

In my case the program has a f90 structure with 11 integers, 2  
logicals, and five 50 element integer arrays.  But at the first stage  
of the program only the first element of those arrays are used.  But  
using MPI_Type_create_struct it is more efficient to send the entire  
263 words of continuous memory (58 sec's) than to try and send only  
18 words of noncontinuous memory (64 sec's).  At the second stage  
it's 33 words and at that stage it becomes 47 sec's vs. 163 sec's, an  
extra 116 seconds, which dominates the push of my overall wall clock  
time from 130 to 278 seconds.  The third stage increases from 13  
seconds to 37 seconds.


Because I need to send this block of data back and forward a lot I  
was hoping to find a way to speed up this data transfer of this odd  
block of data and a couple other variables.  I may try PACK and  
UNPACK on the structure, but calling those lots of times can't be  
more efficient.


Previously I was equivalencing the structure to a integer array and  
sending the integer array as a fast dirty solution to get started and  
it worked.  Not completely portable no doubt.


Michael

ps. I don't currently have valgrind installed on this cluster and  
valgrind is not part of the Debian Linux 3.1r3 distribution.  Without  
any experience with valgrind  I'm not sure how useful valgrind will  
be with a MPI program of 500+ subroutines and 50K+ lines running on  
16 processes.  It took us a bit to get profiling working for the  
OpenMP version of this code.


On Mar 6, 2007, at 11:28 AM, George Bosilca wrote:


I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137
seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack,
for packing 8 integers. I know the code and I'm affirmative there is
no way to spend 27 seconds over there.

Can you run your application using valgrind with the callgrind tool.
This will give you some basic informations about where the time is
spend. This will give us additional information about where to look.

   Thanks,
 george.

On Mar 6, 2007, at 11:26 AM, Michael wrote:


I have a section of code were I need to send 8 separate integers via
BCAST.

Initially I was just putting the 8 integers into an array and then
sending that array.

I just tried using MPI_PACK on those 8 integers and I'm seeing a
massive slow down in the code, I have a lot of other communication
and this section is being used only 5 times.  I went from 140 seconds
to 277 seconds on 16 processors using TCP via a dual gigabit ethernet
setup (I'm the only user working on this system today).

This was run with OpenMPI 1.1.2 to maintain compatibility with a
major HPC site.

Is there a know problem with MPI_PACK/UNPACK in OpenMPI?

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other
half may reach you"
   Kahlil Gibran


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] LSF & OpenMPI

2007-03-12 Thread Michael


What is the status of LSF and OpenMPI?

I'm running on a major HPC system using GM & LSF and we have to use a  
number of workarounds so that we can use OpenMPI.  Specifically,  
using the scripts on this system we have to have our csh file source  
a file to set up the environment on the nodes.  Using OpenMPI's  
mpirun directly does not work because at the very minimum the hosts  
to run on are not available to it, I had a work around but there it  
seems that the environment is not passed to the nodes.


The notes from the support people indicate that the problem is that  
openmpi's mpirun command doesn't recognize the "-gm-copy-env"  
option.  Does this mean anything to anyone?


Open MPI: 1.1.2
   Open MPI SVN revision: r12073

MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: mvapi (MCA v1.0, API v1.0, Component v1.1.2)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)

Have there been any improvements in the compatibility of OpenMPI with  
LSF since version 1.1.2?


Does anyone in the OpenMPI team have access to a system using the LSF  
batch queueing system?  Is an machine with Gm and LSF different yet?


Michael

[OMPI users] portability of the executables compiled with OpenMPI

2007-03-15 Thread Michael

I'm having trouble with the portability of executables compiled with  
OpenMPI.  I suspect the sysadms on the HPC system I'm using changed  
something because I think it worked previously.


Situation: I'm compiling my code locally on a machine with just  
ethernet interfaces and OpenMPI 1.1.2 that I built.


When I attempt to run that executable on a HPC machine with OpenMPI  
1.1.2 and InfiniBand interfaces I get messages about "can't find  
libmosal.so.0.0" -- I'm certain this wasn't happening earlier.


I can compile on this machine and run on it, even though there is no  
libmosal.* in my path.


mpif90 --showme on this system gives me:

/opt/compiler/intel/compiler91/x86_64/bin/ifort -I/opt/mpi/x86_64/ 
intel/9.1/openmpi-1.1.4/include -pthread -I/opt/mpi/x86_64/intel/9.1/ 
openmpi-1.1.4/lib -L/opt/mpi/x86_64/intel/9.1/openmpi-1.1.4/lib -L/ 
opt/gm/lib64 -lmpi_f90 -lmpi -lorte -lopal -lgm -lvapi -lmosal -lrt - 
lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -ldl



I suspect that read access to libmosal.so has been removed and  
somehow when I link on this machine I'm getting a static library,  
i.e. libmosal.a



Does this make any sense?

Is there a flag in this compile line that permits linking an  
executable even when the person doing the linking does not have  
access to all the libraries, i.e.  export-dynamic?



Michael

Re: [OMPI users] portability of the executables compiled with OpenMPI

2007-03-15 Thread Michael



On Mar 15, 2007, at 12:18 PM, Michael wrote:


I'm having trouble with the portability of executables compiled with
OpenMPI.  I suspect the sysadms on the HPC system I'm using changed
something because I think it worked previously.


Apparently there was a misconfiguration, i.e. missing libraries and  
links on some nodes.


I would like to hear just how portable an executable compiled against  
OpenMPI shared libraries should be.


I'm compiling on a Debian Linux system with dual 1.3 GHz AMD Opterons  
per node and an internal network of dual gigabit ethernet.


I'm planning on a SUSE Linux Enterprise Server 9 system with dual 3.6  
GHz Intel Xeon EM64T per node and an internal network using Myrinet.


I believe I actually had this working previously and now there are a  
mix of libraries missing from some nodes.


Michael

Re: [OMPI users] portability of the executables compiled with OpenMPI

2007-03-22 Thread Michael



On Mar 22, 2007, at 7:55 AM, Jeff Squyres wrote:


On Mar 15, 2007, at 12:18 PM, Michael wrote:


Situation: I'm compiling my code locally on a machine with just
ethernet interfaces and OpenMPI 1.1.2 that I built.

When I attempt to run that executable on a HPC machine with OpenMPI
1.1.2 and InfiniBand interfaces I get messages about "can't find
libmosal.so.0.0" -- I'm certain this wasn't happening earlier.

I can compile on this machine and run on it, even though there is no
libmosal.* in my path.

mpif90 --showme on this system gives me:

/opt/compiler/intel/compiler91/x86_64/bin/ifort -I/opt/mpi/x86_64/
intel/9.1/openmpi-1.1.4/include -pthread -I/opt/mpi/x86_64/intel/9.1/
openmpi-1.1.4/lib -L/opt/mpi/x86_64/intel/9.1/openmpi-1.1.4/lib -L/
opt/gm/lib64 -lmpi_f90 -lmpi -lorte -lopal -lgm -lvapi -lmosal -lrt -
lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -ldl


Based on this output, I assume you have configured OMPI with either --
enable-static or otherwise including all plugins in libmpi.so, right?


No, I did not configure OpenMPI on this machine.  I believe OpenMPI  
was configured not static by the installers based on the messages and  
the dependency on the missing libraries.


The issue was that some of the 1000+ nodes on this major HPC machine  
were missing libraries needed for OpenMPI but because of the low  
usage of OpenMPI I'm the first to discover the problem.  For whatever  
reason these libraries are not on the front-end machines that feed  
the main system.  It's always nice running OpenMPI on your own  
machine but not everyone can always do that.


The way I read my experience is that OpenMPI's libmpi.so depends on  
different libraries on different machines, this means that if you  
don't compile static you can compile on a machine that does not have  
libraries for expensive interfaces and run on another machine with  
those expensive interfaces -- that's what I'm am doing now successfully.


Michael

Re: [OMPI users] portability of the executables compiled with OpenMPI

2007-03-22 Thread Michael


For your reference:

The following cross compile/run combination with OpenMPI 1.1.4 is  
currently working for me:


I'm compiling on a Debian Linux system with dual 1.3 GHz AMD Opterons  
per node and an internal network of dual gigabit ethernet.  With  
OpenMPI compiled with Intel Fortran 9.1.041 and gcc 3.3.5


I'm running on a SUSE Linux Enterprise Server 9 system with dual 3.6  
GHz Intel Xeon EM64T per node and an internal network using Myrinet.   
OpenMPI compiled with Intel Fortran 9.1.041 and Intel icc 9.1.046


There is enough compatibility between the two different libmpi.so's  
that I do not have a problem.


I have to periodically check the second system to see if it has been  
updated in which I case I have to update my system.


Michael

[OMPI users] Buffered sends

2007-03-22 Thread Michael


Is there known issue with buffered sends in OpenMPI 1.1.4?

I changed a single send which is called thousands of times from  
MPI_SEND (& MPI_ISEND) to MPI_BSEND (& MPI_IBSEND) and my Fortran 90  
code slowed down by a factor of 10.


I've looked at several references and I can't see where I'm making a  
mistake.  The MPI_SEND is for MPI_PACKED data, so it's first  
parameter is an allocated character array.  I also allocated a  
character array for the buffer passed to MPI_BUFFER_ATTACH.


Looking at the model implementation in a reference they give a model  
of using MPI_PACKED inside MPI_BSEND, I was wondering if this could  
be a problem, i.e. packing packed data?


Michael

ps. I have to use OpenMPI 1.1.4 to maintain compatibility with a  
major HPC center.

[OMPI users] OpenMPI 1.1.4 vs. 1.2

2007-04-17 Thread Michael

To maintain compatibility with a major HPC center I upgraded(?) from  
OpenMPI 1.1.4 to OpenMPI 1.2 on my local cluster.


In testing on my local cluster, 13 dual-Opteron Linux boxes with dual  
gigabit ethernet, I discovered that my program runs slower using  
OpenMPI 1.2 then OpenMPI 1.1.4 (780.3 versus 402.4 seconds with 3  
processes--tested twice to be certain).


This particular version of my program was designed to minimize the  
amount of communications and the only MPI calls that get used a lot  
are MPI_SEND and MPI_RECV with MPI_PACKED data (so MPI_PACK and  
MPI_UNPACK also get used a lot).


Was there a known problem with OpenMPI 1.2 (r14027) and ethernet  
communication that got fixed later?


The same executable run at the major center seems fine, but they have  
Myrinet.


Michael

Re: [OMPI users] ethernet bonding

2007-05-24 Thread Michael


On May 24, 2007, at 10:38 AM, Adams, Samuel D Contr AFRL/HEDR wrote:

We recently got 33 new cluster nodes all of which have two onboard  
GigE
nics.  We also got two powerconnect 2748 48 port switches which  
support

IEEE 802.3ad (link aggregation).  I have configured the nodes to do
Ethernet bonding to aggregate the two nics in to one bonded device:
...
Now I am wondering what is the best way to configure my switches.  I
guess I could do it in two ways: use the IEEE 802.3ab on the switch,
plug both nics of a node into one switch, and have some nodes on  
either

switch, or perhaps for each node, plug one nic in one switch and the
second nic in the other switch.


Based on the configuration of a system that we purchased and our  
experience with that system, I would say that one nic from each node  
into one switch and the other nic from each node into the other  
switch.  This assumes the two switches have more ports than you have  
nodes.


I have no experience with IEEE 802.3ab, someone else would have to  
speak to that.


The question also is which bonding configuration you choose and which  
choices would work and which gives the best performance.


Michael
---
"Producing a system from a specification is like walking on water-- 
it's easier if it's frozen."

[O-MPI users] latest g95: size of FORTRAN integer(selected_int_kind(2))... unknown

2006-01-24 Thread Michael Kluskens


Building Open MPI 1.0.1 on a PowerMac running OS X 10.4.4 using
1) Apple gnu compilers from Xcode 2.2.1
2) fink-installed g77
3) latest g95 "G95 (GCC 4.0.1 (g95!) Jan 23 2006)"
 (the binary from G95 Home)

setenv F77 g77
setenv FC g95
./configure

In the G95 section of the configure I get

checking size of FORTRAN integer(selected_int_kind(2))... unknown
configure: WARNING: *** Problem running configure test!

Gzipped config.log attached.

If I change to the older Fink g95 "G95 (GCC 4.0.2 (g95!) Dec 19  
2005)" I don't see this problem


System info:
uname -a
Darwin 8.4.0 Darwin Kernel Version 8.4.0: Tue Jan  3 18:22:10 PST  
2006; root:xnu-792.6.56.obj~1/RELEASE_PPC Power Macintosh powerpcgcc  
--version

powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
build 5026)

g++ --version
powerpc-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
build 5250)


 g77 --version
GNU Fortran (GCC) 3.4.3

Details on latest G95 build:
g95 -v
Using built-in specs.
Target:
Configured with: /Users/andy/g95/osx/gcc.osx/configure --enable- 
languages=c

Thread model: posix
gcc version 4.0.1 (g95!) Jan 23 2006

Details on older Fink g95 build:
g95 -v
Using built-in specs.
Target:
Configured with: ../configure --prefix=/sw/lib/gcc-lib/powerpc-apple- 
darwin8/4.0.2 --with-gmp=/sw --enable-languages=c --disable-checking  
--with-included-gettext

Thread model: posix
gcc version 4.0.2 (g95!) Dec 19 2005




config.log.gz
Description: GNU Zip compressed data

Re: [O-MPI users] latest g95: size of FORTRAN integer(selected_int_kind(2))... unknown

2006-01-26 Thread Michael Kluskens


Confirmed by author of g95:

On Jan 26, 2006, at 3:40 PM, Andy Vaught wrote:


  It's a known issue.  Use

LDFLAGS=-lSystemStubs

on the configure line.


On Jan 26, 2006, at 11:35 AM, Kraig Winters wrote:


I believe that

ld: Undefined symbols:
_fprintf$LDBLStub
can be fixed by adding -L/usr/lib -lSystemStubs to your link  
statement.
For xlf, this can be done once and for all in the compiler  
configuration

file. I don't know if something similar can be done for g95. This
problem
seems to have started w/ 10.4.
Kraig

On Jan 26, 2006, at 4:57 AM, Jeff Squyres wrote:


It looks like your g95 may not be installed correctly.  Here's the
relevant information from the config.log:

configure:32697: gcc -O3 -DNDEBUG -fno-strict-aliasing -I. -c
conftest.c
configure:32704: $? = 0
configure:32714: g95   conftestf.f90 conftest.o -o conftest
ld: Undefined symbols:
_fprintf$LDBLStub

That is, configure tried to compile a .f90 file and link it with a C-
compiled .o file (normally, this should work just fine).  In
performing the final link, however, it did not find the symbol  
fprintf

().

This seems to indicate that the g95 compiler was not able to find the
C libraries properly.

Can you verify that everything is installed properly, and that g95 is
able to link to C libraries?



On Jan 24, 2006, at 3:11 PM, Michael Kluskens wrote:


Building Open MPI 1.0.1 on a PowerMac running OS X 10.4.4 using
1) Apple gnu compilers from Xcode 2.2.1
2) fink-installed g77
3) latest g95 "G95 (GCC 4.0.1 (g95!) Jan 23 2006)"
 (the binary from G95 Home)

setenv F77 g77
setenv FC g95
./configure

In the G95 section of the configure I get

checking size of FORTRAN integer(selected_int_kind(2))... unknown
configure: WARNING: *** Problem running configure test!

Gzipped config.log attached.

If I change to the older Fink g95 "G95 (GCC 4.0.2 (g95!) Dec 19
2005)" I don't see this problem

System info:
uname -a
Darwin 8.4.0 Darwin Kernel Version 8.4.0: Tue Jan  3 18:22:10 PST
2006; root:xnu-792.6.56.obj~1/RELEASE_PPC Power Macintosh
powerpcgcc --version
powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
build 5026)

g++ --version
powerpc-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
build 5250)

 g77 --version
GNU Fortran (GCC) 3.4.3

Details on latest G95 build:
g95 -v
Using built-in specs.
Target:
Configured with: /Users/andy/g95/osx/gcc.osx/configure --enable-
languages=c
Thread model: posix
gcc version 4.0.1 (g95!) Jan 23 2006

Details on older Fink g95 build:
g95 -v
Using built-in specs.
Target:
Configured with: ../configure --prefix=/sw/lib/gcc-lib/powerpc-
apple-darwin8/4.0.2 --with-gmp=/sw --enable-languages=c --disable-
checking --with-included-gettext
Thread model: posix
gcc version 4.0.2 (g95!) Dec 19 2005



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[O-MPI users] f90 compiling: USE MPI vs. include 'mpif.h'

2006-01-30 Thread Michael Kluskens


Question regarding f90 compiling

Using:

USE MPI

instead of

include 'mpif.h'

makes the compilation take an extra two minutes using g95 under OS X  
10.4.4 (simple test program 115 seconds versus 0.2 seconds)


Is this normal?

Michael

Re: [OMPI users] Runtime replacement of mpi libraries?

2014-09-11 Thread Michael Raymond

  Another option is SGI PerfBoost. It will let you run apps compiled 
against other ABIs with SGI MPT with practically no performance loss.


$ module load openmpi
$ make
$ module unload openmpi
$ module load mpt perfboost
$ mpiexec_mpt -np 2 perfboost -ompi a.out

On 09/11/2014 01:28 PM, JR Cary wrote:


We need to build an application on our machine with one mpi (e.g. openmpi),
but for performance reasons, upon installation, we would like to runtime
link to a different, specialized mpi, such as an SGI implementation provided
for their systems.

Can one expect this to work?

I tried this with openmpi and mpich, building the code against shared
openmpi
and then changing the LD_LIBRARY_PATH to point to the shared mpich.  This
failed due to the sonames being different.

$ ldd foo  | grep mpi
 libmpi_usempi.so.1 => not found
 libmpi_mpifh.so.2 => not found
 libmpi.so.1 => not found
 libmpi_cxx.so.1 => not found

but in the mpich distribution one has different sonames

libmpi.so.12

so the runtime loader will not load the mpich libraries instead.

and the fortran libraries (which may not matter to us) have different
names,

$ \ls /contrib/mpich-shared/lib/*.so.12
/contrib/mpich-shared/lib/libmpicxx.so.12
/contrib/mpich-shared/lib/libmpifort.so.12
/contrib/mpich-shared/lib/libmpi.so.12

Is there a general approach to this?

Or in practice, must one build on a machine to use that machine's MPI?

Thx.John Cary




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25311.php



--
Michael A. Raymond
SGI MPT Team Leader
1 (651) 683-7523

Re: [OMPI users] Runtime replacement of mpi libraries?

2014-09-11 Thread Michael Raymond

  Doing the `module load perfboost` sets the LD_LIBRARY_PATH. To see 
more, after doing the module load of both SGI modules, do an `ldd` on 
your app.


On 09/11/2014 03:40 PM, John Cary wrote:

Thanks much!

So does mpiexec_mpt then set the LD_LIBRARY_PATH as needed?

John

On 9/11/2014 1:27 PM, Michael Raymond wrote:

Another option is SGI PerfBoost. It will let you run apps compiled
against other ABIs with SGI MPT with practically no performance loss.

$ module load openmpi
$ make
$ module unload openmpi
$ module load mpt perfboost
$ mpiexec_mpt -np 2 perfboost -ompi a.out

On 09/11/2014 01:28 PM, JR Cary wrote:


We need to build an application on our machine with one mpi (e.g.
openmpi),
but for performance reasons, upon installation, we would like to runtime
link to a different, specialized mpi, such as an SGI implementation
provided
for their systems.

Can one expect this to work?

I tried this with openmpi and mpich, building the code against shared
openmpi
and then changing the LD_LIBRARY_PATH to point to the shared mpich. This
failed due to the sonames being different.

$ ldd foo  | grep mpi
 libmpi_usempi.so.1 => not found
 libmpi_mpifh.so.2 => not found
 libmpi.so.1 => not found
 libmpi_cxx.so.1 => not found

but in the mpich distribution one has different sonames

libmpi.so.12

so the runtime loader will not load the mpich libraries instead.

and the fortran libraries (which may not matter to us) have different
names,

$ \ls /contrib/mpich-shared/lib/*.so.12
/contrib/mpich-shared/lib/libmpicxx.so.12
/contrib/mpich-shared/lib/libmpifort.so.12
/contrib/mpich-shared/lib/libmpi.so.12

Is there a general approach to this?

Or in practice, must one build on a machine to use that machine's MPI?

Thx.John Cary




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25311.php





___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/09/25316.php


--
Michael A. Raymond
SGI MPT Team Leader
1 (651) 683-7523

[OMPI users] Strange behavior of OMPI 1.8.3

2014-10-06 Thread Michael Thomadakis

Hello,

I've configured OpenMPI1.8.3 with the following command line



$ AXFLAGS="-xSSE4.2 -axAVX,CORE-AVX-I,CORE-AVX2"
$ myFLAGS="-O2 ${AXFLAGS}" ;

$ ./configure --prefix=${proot} \
--with-lsf \
--with-cma \
--enable-peruse --enable-branch-probabilities \
--enable-mpi-fortran=all \
--enable-cxx-exceptions \
--enable-ipv6 \
--enable-sparse-groups \
--with-threads=posix \
--enable-mpi-thread-multiple \
--enable-openib-connectx-xrc \
--enable-mtl-portals4-flow-control \
--with-hwloc=internal \
--enable-orterun-prefix-by-default \
--with-ident-string="MikeT_15.0" \
CC=icc CFLAGS="$myFLAGS" \
CXX=icpc CXXFLAGS="$myFLAGS" \
F77=ifort FFLAGS="$myFLAGS" FC=ifort FCFLAGS="$myFLAGS" \
LIBS="-lnsl" \
&& make -j 8 && make install

but when I run it with

$ mpirun --bind-to core --map-by core -mca mpi_show_mca_params all --host
H1,H2 -np 2
~/performance/analysis/networks/Intel64_SandyBridge/HPCI/OMB_4.3.0/ompi_1.8.2/cpu/osu-micro-benchmarks-4.3/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_bibw
H H

I am getting
"
[H1:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by
protocol in file oob_tcp_listener.c at line 120
[h2:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by
protocol in file oob_tcp_component.c at line 584

"

Any suggestions ?


Thanks !
Michael

Re: [OMPI users] Strange behavior of OMPI 1.8.3

2014-10-07 Thread Michael Thomadakis

Hi Howard,

We have NOT defined IPv6 on the nodes.


Actually I was looking at the location of the code that complains and I
also saw references to IPv6 sockets.

Thanks a lot for the suggestion! I'll try this out tomorrow.

Regards
Michael

On Mon, Oct 6, 2014 at 11:07 PM, Howard Pritchard 
wrote:

> Hi Michael,
>
> If you do not include --enable-ipv6 in the config line, do you still
> observe the problem?
> Is it possible that one or more interfaces on nodes H1 and H2 do not have
> ipv6 enabled?
>
> Howard
>
>
> 2014-10-06 16:51 GMT-06:00 Michael Thomadakis :
>
>> Hello,
>>
>> I've configured OpenMPI1.8.3 with the following command line
>>
>>
>>
>> $ AXFLAGS="-xSSE4.2 -axAVX,CORE-AVX-I,CORE-AVX2"
>> $ myFLAGS="-O2 ${AXFLAGS}" ;
>>
>> $ ./configure --prefix=${proot} \
>> --with-lsf \
>> --with-cma \
>> --enable-peruse --enable-branch-probabilities \
>> --enable-mpi-fortran=all \
>> --enable-cxx-exceptions \
>> --enable-ipv6 \
>> --enable-sparse-groups \
>> --with-threads=posix \
>> --enable-mpi-thread-multiple \
>> --enable-openib-connectx-xrc \
>> --enable-mtl-portals4-flow-control \
>> --with-hwloc=internal \
>> --enable-orterun-prefix-by-default \
>> --with-ident-string="MikeT_15.0" \
>> CC=icc CFLAGS="$myFLAGS" \
>> CXX=icpc CXXFLAGS="$myFLAGS" \
>> F77=ifort FFLAGS="$myFLAGS" FC=ifort FCFLAGS="$myFLAGS" \
>> LIBS="-lnsl" \
>> && make -j 8 && make install
>>
>> but when I run it with
>>
>> $ mpirun --bind-to core --map-by core -mca mpi_show_mca_params all --host
>> H1,H2 -np 2
>> ~/performance/analysis/networks/Intel64_SandyBridge/HPCI/OMB_4.3.0/ompi_1.8.2/cpu/osu-micro-benchmarks-4.3/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_bibw
>> H H
>>
>> I am getting
>> "
>> [H1:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by
>> protocol in file oob_tcp_listener.c at line 120
>> [h2:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by
>> protocol in file oob_tcp_component.c at line 584
>>
>> "
>>
>> Any suggestions ?
>>
>>
>> Thanks !
>> Michael
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/10/25468.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25472.php
>

[OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis

Dear OpenMPI list,

As far as I know, when we build OpenMPI itself with GNU or Intel compilers
we expect that the subsequent MPI application binary will use the same
compiler set and run-times.

Would it be possible to build OpenMPI with the GNU tool chain but then
subsequently instruct the OpenMPI compiler wrappers to use the Intel
compiler set? Would there be any issues with compiling C++ / Fortran or
corresponding OMP codes ?

In general, what is clean way to build OpenMPI with a GNU compiler set but
then instruct the wrappers to use Intel compiler set?

Thanks!
Michael
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis

Thanks for the note. How about OMP runtimes though?

Michael

On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users 
wrote:

> On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc and
> g++ that this should be possible.  This is not so for Fortran libraries or
> Windows.
>
>
>
>
>
>
> Sent via the Samsung Galaxy S8 active, an AT&T 4G LTE smartphone
>
> ---- Original message 
> From: Michael Thomadakis 
> Date: 9/18/17 3:51 PM (GMT-05:00)
> To: users@lists.open-mpi.org
> Subject: [OMPI users] Question concerning compatibility of languages used
> with building OpenMPI and languages OpenMPI uses to build MPI binaries.
>
> Dear OpenMPI list,
>
> As far as I know, when we build OpenMPI itself with GNU or Intel compilers
> we expect that the subsequent MPI application binary will use the same
> compiler set and run-times.
>
> Would it be possible to build OpenMPI with the GNU tool chain but then
> subsequently instruct the OpenMPI compiler wrappers to use the Intel
> compiler set? Would there be any issues with compiling C++ / Fortran or
> corresponding OMP codes ?
>
> In general, what is clean way to build OpenMPI with a GNU compiler set but
> then instruct the wrappers to use Intel compiler set?
>
> Thanks!
> Michael
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis

Hello OpenMPI team,

Thank you for the insightful feedback. I am not claiming in any way that it
is a meaningful practice to build the OpenMPI stack with one compiler and
then just try to convince / force it to use another compilation environment
to build MPI applications. There are occasions though that one *may only
have an OpenMPI* stack built, say, by GNU compilers but for efficiency of
execution of the resulting MPI applications try to use Intel / PGI
compilers with the same OpenMPI stack to compile MPI applications.

It is too much unnecessary trouble to use the same MPI stack with different
compilation environments.

Thank you,
Michael

On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Even if i do not fully understand the question, keep in mind Open MPI
> does not use OpenMP, so from that point of view, Open MPI is
> independant of the OpenMP runtime.
>
> Let me emphasize on what Jeff already wrote : use different installs
> of Open MPI (and you can use modules or lmod in order to choose
> between them easily) and always use the compilers that were used to
> build Open MPI. This is mandatory is you use Fortran bindings (use mpi
> and use mpi_f08), and you'd better keep yourself out of trouble with
> C/C++ and mpif.h
>
> Cheers,
>
> Gilles
>
> On Tue, Sep 19, 2017 at 5:57 AM, Michael Thomadakis
>  wrote:
> > Thanks for the note. How about OMP runtimes though?
> >
> > Michael
> >
> > On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users <
> users@lists.open-mpi.org>
> > wrote:
> >>
> >> On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc
> and
> >> g++ that this should be possible.  This is not so for Fortran libraries
> or
> >> Windows.
> >>
> >>
> >>
> >>
> >>
> >>
> >> Sent via the Samsung Galaxy S8 active, an AT&T 4G LTE smartphone
> >>
> >>  Original message 
> >> From: Michael Thomadakis 
> >> Date: 9/18/17 3:51 PM (GMT-05:00)
> >> To: users@lists.open-mpi.org
> >> Subject: [OMPI users] Question concerning compatibility of languages
> used
> >> with building OpenMPI and languages OpenMPI uses to build MPI binaries.
> >>
> >> Dear OpenMPI list,
> >>
> >> As far as I know, when we build OpenMPI itself with GNU or Intel
> compilers
> >> we expect that the subsequent MPI application binary will use the same
> >> compiler set and run-times.
> >>
> >> Would it be possible to build OpenMPI with the GNU tool chain but then
> >> subsequently instruct the OpenMPI compiler wrappers to use the Intel
> >> compiler set? Would there be any issues with compiling C++ / Fortran or
> >> corresponding OMP codes ?
> >>
> >> In general, what is clean way to build OpenMPI with a GNU compiler set
> but
> >> then instruct the wrappers to use Intel compiler set?
> >>
> >> Thanks!
> >> Michael
> >>
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >
> >
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.

2017-09-18 Thread Michael Thomadakis

OMP is yet another source of incompatibility between GNU and Intel
environments. So compiling say Fortran OMP code into a library and trying
to link it with Intel Fortran codes just aggravates the problem.
Michael

On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Even if i do not fully understand the question, keep in mind Open MPI
> does not use OpenMP, so from that point of view, Open MPI is
> independant of the OpenMP runtime.
>
> Let me emphasize on what Jeff already wrote : use different installs
> of Open MPI (and you can use modules or lmod in order to choose
> between them easily) and always use the compilers that were used to
> build Open MPI. This is mandatory is you use Fortran bindings (use mpi
> and use mpi_f08), and you'd better keep yourself out of trouble with
> C/C++ and mpif.h
>
> Cheers,
>
> Gilles
>
> On Tue, Sep 19, 2017 at 5:57 AM, Michael Thomadakis
>  wrote:
> > Thanks for the note. How about OMP runtimes though?
> >
> > Michael
> >
> > On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users <
> users@lists.open-mpi.org>
> > wrote:
> >>
> >> On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc
> and
> >> g++ that this should be possible.  This is not so for Fortran libraries
> or
> >> Windows.
> >>
> >>
> >>
> >>
> >>
> >>
> >> Sent via the Samsung Galaxy S8 active, an AT&T 4G LTE smartphone
> >>
> >>  Original message 
> >> From: Michael Thomadakis 
> >> Date: 9/18/17 3:51 PM (GMT-05:00)
> >> To: users@lists.open-mpi.org
> >> Subject: [OMPI users] Question concerning compatibility of languages
> used
> >> with building OpenMPI and languages OpenMPI uses to build MPI binaries.
> >>
> >> Dear OpenMPI list,
> >>
> >> As far as I know, when we build OpenMPI itself with GNU or Intel
> compilers
> >> we expect that the subsequent MPI application binary will use the same
> >> compiler set and run-times.
> >>
> >> Would it be possible to build OpenMPI with the GNU tool chain but then
> >> subsequently instruct the OpenMPI compiler wrappers to use the Intel
> >> compiler set? Would there be any issues with compiling C++ / Fortran or
> >> corresponding OMP codes ?
> >>
> >> In general, what is clean way to build OpenMPI with a GNU compiler set
> but
> >> then instruct the wrappers to use Intel compiler set?
> >>
> >> Thanks!
> >> Michael
> >>
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >
> >
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Segmentation fault with SLURM and non-local nodes

2011-01-27 Thread Michael Curtis

Hi,

I'm not sure whether this problem is with SLURM or OpenMPI, but the stack 
traces (below) point to an issue within OpenMPI.

Whenever I try to launch an MPI job within SLURM, mpirun immediately 
segmentation faults -- but only if the machine that SLURM allocated to MPI is 
different to the one that I launched the MPI job.

However, if I force SLURM to allocate only the local node (ie, the one on which 
salloc was called), everything works fine.

Failing case:
michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi
    JOB MAP   

 Data for node: Name: ipc4  Num procs: 8
Process OMPI jobid: [21326,1] Process rank: 0
Process OMPI jobid: [21326,1] Process rank: 1
Process OMPI jobid: [21326,1] Process rank: 2
Process OMPI jobid: [21326,1] Process rank: 3
Process OMPI jobid: [21326,1] Process rank: 4
Process OMPI jobid: [21326,1] Process rank: 5
Process OMPI jobid: [21326,1] Process rank: 6
Process OMPI jobid: [21326,1] Process rank: 7

 =
[ipc:16986] *** Process received signal ***
[ipc:16986] Signal: Segmentation fault (11)
[ipc:16986] Signal code: Address not mapped (1)
[ipc:16986] Failing at address: 0x801328268
[ipc:16986] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7ff85c7638f0]
[ipc:16986] [ 1] /usr/lib/libopen-rte.so.0(+0x3459a) [0x7ff85d4a059a]
[ipc:16986] [ 2] /usr/lib/libopen-pal.so.0(+0x1eeb8) [0x7ff85d233eb8]
[ipc:16986] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7ff85d228439]
[ipc:16986] [ 4] /usr/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0x9d) 
[0x7ff85d4a002d]
[ipc:16986] [ 5] /usr/lib/openmpi/lib/openmpi/mca_plm_slurm.so(+0x211a) 
[0x7ff85bbc311a]
[ipc:16986] [ 6] mpirun() [0x403c1f]
[ipc:16986] [ 7] mpirun() [0x403014]
[ipc:16986] [ 8] /lib/libc.so.6(__libc_start_main+0xfd) [0x7ff85c3efc4d]
[ipc:16986] [ 9] mpirun() [0x402f39]
[ipc:16986] *** End of error message ***

Non-failing case:
michael@eng-ipc4 ~ $ salloc -n8 -w ipc4 mpirun --display-map ./mpi
    JOB MAP   

 Data for node: Name: eng-ipc4.FQDN Num procs: 8
Process OMPI jobid: [12467,1] Process rank: 0
Process OMPI jobid: [12467,1] Process rank: 1
Process OMPI jobid: [12467,1] Process rank: 2
Process OMPI jobid: [12467,1] Process rank: 3
Process OMPI jobid: [12467,1] Process rank: 4
Process OMPI jobid: [12467,1] Process rank: 5
Process OMPI jobid: [12467,1] Process rank: 6
Process OMPI jobid: [12467,1] Process rank: 7

 =
Process 1 on eng-ipc4.FQDN out of 8
Process 3 on eng-ipc4.FQDN out of 8
Process 4 on eng-ipc4.FQDN out of 8
Process 6 on eng-ipc4.FQDN out of 8
Process 7 on eng-ipc4.FQDN out of 8
Process 0 on eng-ipc4.FQDN out of 8
Process 2 on eng-ipc4.FQDN out of 8
Process 5 on eng-ipc4.FQDN out of 8

Using mpi directly is fine:
eg mpirun -H 'ipc3,ipc4'  -np 8 ./mpi
Works as expected

This is a (small) homogenous cluster, all Xeon class machines with plenty of 
RAM and shared filesystem over NFS, running 64-bit Ubuntu server.  I was 
running stock OpenMPI (1.4.1) and SLURM (2.1.1), I have since upgraded to 
latest stable OpenMPI (1.4.3) and SLURM (2.2.0), with no effect. (the newer 
binaries were compiled from the respective upstream Debian packages).

strace (not shown) shows that the job is launched via srun and a connection is 
received back from the child process over TCP/IP. Soon after this, mpirun 
crashes. Nodes communicate over a semi-dedicated TCP/IP GigE connection.

Is this a known bug? What is going wrong?

Regards,
Michael Curtis

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-01-28 Thread Michael Curtis


On 27/01/2011, at 4:51 PM, Michael Curtis wrote:

Some more debugging information:

> Failing case:
> michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi
>    JOB MAP   

Backtrace with debugging symbols
#0  0x77bb5c1e in ?? () from /usr/lib/libopen-rte.so.0
#1  0x7792e23f in ?? () from /usr/lib/libopen-pal.so.0
#2  0x77920679 in opal_progress () from /usr/lib/libopen-pal.so.0
#3  0x77bb6e5d in orte_plm_base_daemon_callback () from 
/usr/lib/libopen-rte.so.0
#4  0x762b67e7 in plm_slurm_launch_job (jdata=) at 
../../../../../../orte/mca/plm/slurm/plm_slurm_module.c:360
#5  0x004041c8 in orterun (argc=4, argv=0x7fffe7d8) at 
../../../../../orte/tools/orterun/orterun.c:754
#6  0x00403234 in main (argc=4, argv=0x7fffe7d8) at 
../../../../../orte/tools/orterun/main.c:13

Trace output with -d100 and --enable-trace:
[:10821] progressed_wait: 
../../../../../orte/mca/plm/base/plm_base_launch_support.c 459
[:10821] defining message event: 
../../../../../orte/mca/plm/base/plm_base_launch_support.c 423

I'm guessing from this that it's crashing in the event loop, maybe at :
static void process_orted_launch_report(int fd, short event, void *data)

strace:
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, 
{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}], 6, 
1000) = 1 ([{fd=13, revents=POLLIN}])
readv(13, 
[{"R\333\0\0\377\377\377\377R\333\0\0\377\377\377\377R\333\0\0\0\0\0\0\0\0\0\4\0\0\0\232"...,
 36}], 1) = 36
readv(13, 
[{"R\333\0\0\377\377\377\377R\333\0\0\0\0\0\0\0\0\0\n\0\0\0\1\0\0\0u1390"..., 
154}], 1) = 154
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, 
{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}], 6, 0) = 
0 (Timeout)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---


OK, I matched the disassemblies and confirmed that the crash originates in 
process_orted_launch_report, and therefore matched up the source code line with 
where gdb reckons the program counter was at that point:

/* update state */
pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;

Hopefully all this information helps a little!

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-02 Thread Michael Curtis

On 28/01/2011, at 8:16 PM, Michael Curtis wrote:

> 
> On 27/01/2011, at 4:51 PM, Michael Curtis wrote:
> 
> Some more debugging information:
Is anyone able to help with this problem?  As far as I can tell it's a 
stock-standard recently installed SLURM installation.

I can try 1.5.1 but hesitant to deploy this as it would require a recompile of 
some rather large pieces of software.  Should I re-post to the -devel lists?

Regards,

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-06 Thread Michael Curtis


On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote:

> I just tried to reproduce the problem that you are experiencing and was 
> unable to.
> 
> 
> SLURM 2.1.15
> Open MPI 1.4.3 configured with: 
> --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas
> 
> I'll dig a bit further.

Interesting.  I'll try a local, vanilla (ie, non-debian) build and report back. 
 

Michael

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-06 Thread Michael Curtis


On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote:

Hi,

> I just tried to reproduce the problem that you are experiencing and was 
> unable to.
> 
> SLURM 2.1.15
> Open MPI 1.4.3 configured with: 
> --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas

I compiled OpenMPI 1.4.3 (vanilla from source tarball) with the same platform 
file (the only change was to re-enable btl-tcp).

Unfortunately, the result is the same:
salloc -n16 ~/../openmpi/bin/mpirun --display-map ~/ServerAdmin/mpi
salloc: Granted job allocation 145

    JOB MAP   

 Data for node: Name: eng-ipc4.{FQDN}   Num procs: 8
Process OMPI jobid: [6932,1] Process rank: 0
Process OMPI jobid: [6932,1] Process rank: 1
Process OMPI jobid: [6932,1] Process rank: 2
Process OMPI jobid: [6932,1] Process rank: 3
Process OMPI jobid: [6932,1] Process rank: 4
Process OMPI jobid: [6932,1] Process rank: 5
Process OMPI jobid: [6932,1] Process rank: 6
Process OMPI jobid: [6932,1] Process rank: 7

 Data for node: Name: ipc3  Num procs: 8
Process OMPI jobid: [6932,1] Process rank: 8
Process OMPI jobid: [6932,1] Process rank: 9
Process OMPI jobid: [6932,1] Process rank: 10
Process OMPI jobid: [6932,1] Process rank: 11
Process OMPI jobid: [6932,1] Process rank: 12
Process OMPI jobid: [6932,1] Process rank: 13
Process OMPI jobid: [6932,1] Process rank: 14
Process OMPI jobid: [6932,1] Process rank: 15

 =
[eng-ipc4:31754] *** Process received signal ***
[eng-ipc4:31754] Signal: Segmentation fault (11)
[eng-ipc4:31754] Signal code: Address not mapped (1)
[eng-ipc4:31754] Failing at address: 0x8012eb748
[eng-ipc4:31754] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f81ce4bf8f0]
[eng-ipc4:31754] [ 1] ~/../openmpi/lib/libopen-rte.so.0(+0x7f869) 
[0x7f81cf262869]
[eng-ipc4:31754] [ 2] ~/../openmpi/lib/libopen-pal.so.0(+0x22338) 
[0x7f81cef93338]
[eng-ipc4:31754] [ 3] ~/../openmpi/lib/libopen-pal.so.0(+0x2297e) 
[0x7f81cef9397e]
[eng-ipc4:31754] [ 4] ~/../openmpi/lib/libopen-pal.so.0(opal_event_loop+0x1f) 
[0x7f81cef9356f]
[eng-ipc4:31754] [ 5] ~/../openmpi/lib/libopen-pal.so.0(opal_progress+0x89) 
[0x7f81cef87916]
[eng-ipc4:31754] [ 6] 
~/../openmpi/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0x13f) 
[0x7f81cf262e20]
[eng-ipc4:31754] [ 7] ~/../openmpi/lib/libopen-rte.so.0(+0x84ed7) 
[0x7f81cf267ed7]
[eng-ipc4:31754] [ 8] ~/../home/../openmpi/bin/mpirun() [0x403f46]
[eng-ipc4:31754] [ 9] ~/../home/../openmpi/bin/mpirun() [0x402fb4]
[eng-ipc4:31754] [10] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f81ce14bc4d]
[eng-ipc4:31754] [11] ~/../openmpi/bin/mpirun() [0x402ed9]
[eng-ipc4:31754] *** End of error message ***
salloc: Relinquishing job allocation 145
salloc: Job allocation 145 has been revoked.
zsh: exit 1 salloc -n16 ~/../openmpi/bin/mpirun --display-map 
~/ServerAdmin/mpi

I've anonymised the paths and domain, otherwise pasted verbatim.  The only odd 
thing I notice is that the launching machine uses its full domain name, whereas 
the other machine is referred to by the short name.  Despite the FQDN, the 
domain does not exist in the DNS (for historical reasons), but does exist in 
the /etc/hosts file.  

Any further clues would be appreciated.  In case it may be relevant, core 
system versions are: glibc 2.11, gcc 4.4.3, kernel 2.6.32.  One other point of 
difference may be that our environment is tcp (ethernet) based whereas the LANL 
test environment is not?

Michael

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-06 Thread Michael Curtis


On 07/02/2011, at 12:36 PM, Michael Curtis wrote:

> 
> On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote:
> 
> Hi,
> 
>> I just tried to reproduce the problem that you are experiencing and was 
>> unable to.
>> 
>> SLURM 2.1.15
>> Open MPI 1.4.3 configured with: 
>> --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas
> 
> I compiled OpenMPI 1.4.3 (vanilla from source tarball) with the same platform 
> file (the only change was to re-enable btl-tcp).
> 
> Unfortunately, the result is the same:

To reply to my own post again (sorry!), I tried OpenMPI 1.5.1.  This works fine:
salloc -n16 ~/../openmpi/bin/mpirun --display-map mpi
salloc: Granted job allocation 151

    JOB MAP   

 Data for node: ipc3Num procs: 8
Process OMPI jobid: [3365,1] Process rank: 0
Process OMPI jobid: [3365,1] Process rank: 1
Process OMPI jobid: [3365,1] Process rank: 2
Process OMPI jobid: [3365,1] Process rank: 3
Process OMPI jobid: [3365,1] Process rank: 4
Process OMPI jobid: [3365,1] Process rank: 5
Process OMPI jobid: [3365,1] Process rank: 6
Process OMPI jobid: [3365,1] Process rank: 7

 Data for node: ipc4Num procs: 8
Process OMPI jobid: [3365,1] Process rank: 8
Process OMPI jobid: [3365,1] Process rank: 9
Process OMPI jobid: [3365,1] Process rank: 10
Process OMPI jobid: [3365,1] Process rank: 11
Process OMPI jobid: [3365,1] Process rank: 12
Process OMPI jobid: [3365,1] Process rank: 13
Process OMPI jobid: [3365,1] Process rank: 14
Process OMPI jobid: [3365,1] Process rank: 15

 =
Process 2 on eng-ipc3.{FQDN} out of 16
Process 4 on eng-ipc3.{FQDN} out of 16
Process 5 on eng-ipc3.{FQDN} out of 16
Process 0 on eng-ipc3.{FQDN} out of 16
Process 1 on eng-ipc3.{FQDN} out of 16
Process 6 on eng-ipc3.{FQDN} out of 16
Process 3 on eng-ipc3.{FQDN} out of 16
Process 7 on eng-ipc3.{FQDN} out of 16
Process 8 on eng-ipc4.{FQDN} out of 16
Process 11 on eng-ipc4.{FQDN} out of 16
Process 12 on eng-ipc4.{FQDN} out of 16
Process 14 on eng-ipc4.{FQDN} out of 16
Process 15 on eng-ipc4.{FQDN} out of 16
Process 10 on eng-ipc4.{FQDN} out of 16
Process 9 on eng-ipc4.{FQDN} out of 16
Process 13 on eng-ipc4.{FQDN} out of 16
salloc: Relinquishing job allocation 151

It does seem very much like there is a bug of some sort in 1.4.3?

Michael

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Michael Curtis

On 09/02/2011, at 2:38 AM, Ralph Castain wrote:

> Another possibility to check - are you sure you are getting the same OMPI 
> version on the backend nodes? When I see it work on local node, but fail 
> multi-node, the most common problem is that you are picking up a different 
> OMPI version due to path differences on the backend nodes.

It's installed as a system package, and the software set on all machines is 
managed by a configuration tool, so the machines should be identical.  However, 
it may be worth checking the dependency versions and I'll double check that the 
OMPI versions really do match.

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Michael Curtis


On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote:

> Hi Michael,
> 
> You may have tried to send some debug information to the list, but it appears 
> to have been blocked.  Compressed text output of the backtrace text is 
> sufficient.


Odd, I thought I sent it to you directly.  In any case, here is the backtrace 
and some information from gdb:

$ salloc -n16 gdb -args mpirun mpi
(gdb) run
Starting program: /mnt/f1/michael/openmpi/bin/mpirun 
/mnt/f1/michael/home/ServerAdmin/mpi
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, 
data=0x681170) at base/plm_base_launch_support.c:342
342 pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;
(gdb) bt
#0  0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, 
data=0x681170) at base/plm_base_launch_support.c:342
#1  0x778a7338 in event_process_active (base=0x615240) at event.c:651
#2  0x778a797e in opal_event_base_loop (base=0x615240, flags=1) at 
event.c:823
#3  0x778a756f in opal_event_loop (flags=1) at event.c:730
#4  0x7789b916 in opal_progress () at runtime/opal_progress.c:189
#5  0x77b76e20 in orte_plm_base_daemon_callback (num_daemons=2) at 
base/plm_base_launch_support.c:459
#6  0x77b7bed7 in plm_slurm_launch_job (jdata=0x610560) at 
plm_slurm_module.c:360
#7  0x00403f46 in orterun (argc=2, argv=0x7fffe7d8) at orterun.c:754
#8  0x00402fb4 in main (argc=2, argv=0x7fffe7d8) at main.c:13
(gdb) print pdatorted
$1 = (orte_proc_t **) 0x67c610
(gdb) print mev
$2 = (orte_message_event_t *) 0x681550
(gdb) print mev->sender.vpid
$3 = 4294967295
(gdb) print mev->sender
$4 = {jobid = 1721696256, vpid = 4294967295}
(gdb) print *mev
$5 = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x77dd4f40, 
obj_reference_count = 1, cls_init_file_name = 0x77bb9a78 
"base/plm_base_launch_support.c", 
   cls_init_lineno = 423}, ev = 0x680850, sender = {jobid = 1721696256, vpid = 
4294967295}, buffer = 0x6811b0, tag = 10, file = 0x680640 
"rml_oob_component.c", line = 279}

That vpid looks suspiciously like -1.

Further debugging:
Breakpoint 3, orted_report_launch (status=32767, sender=0x7fffe170, 
buffer=0x77b1a85f, tag=32767, cbdata=0x612d20) at 
base/plm_base_launch_support.c:411
411 {
(gdb) print sender
$2 = (orte_process_name_t *) 0x7fffe170
(gdb) print *sender
$3 = {jobid = 6822016, vpid = 0}
(gdb) continue
Continuing.
--
A daemon (pid unknown) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--

Program received signal SIGSEGV, Segmentation fault.
0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, 
data=0x681550) at base/plm_base_launch_support.c:342
342 pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;
(gdb) print mev->sender
$4 = {jobid = 1778450432, vpid = 4294967295}

The daemon probably died as I spent too long thinking about my gdb input ;)

Re: [OMPI users] Segmentation fault with SLURM and non-local nodes

2011-02-08 Thread Michael Curtis


On 09/02/2011, at 9:16 AM, Ralph Castain wrote:

> See below
> 
> 
> On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote:
> 
>> 
>> On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote:
>> 
>>> Hi Michael,
>>> 
>>> You may have tried to send some debug information to the list, but it 
>>> appears to have been blocked.  Compressed text output of the backtrace text 
>>> is sufficient.
>> 
>> 
>> Odd, I thought I sent it to you directly.  In any case, here is the 
>> backtrace and some information from gdb:
>> 
>> $ salloc -n16 gdb -args mpirun mpi
>> (gdb) run
>> Starting program: /mnt/f1/michael/openmpi/bin/mpirun 
>> /mnt/f1/michael/home/ServerAdmin/mpi
>> [Thread debugging using libthread_db enabled]
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, 
>> data=0x681170) at base/plm_base_launch_support.c:342
>> 342  pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;
>> (gdb) bt
>> #0  0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, 
>> data=0x681170) at base/plm_base_launch_support.c:342
>> #1  0x778a7338 in event_process_active (base=0x615240) at event.c:651
>> #2  0x778a797e in opal_event_base_loop (base=0x615240, flags=1) at 
>> event.c:823
>> #3  0x778a756f in opal_event_loop (flags=1) at event.c:730
>> #4  0x7789b916 in opal_progress () at runtime/opal_progress.c:189
>> #5  0x77b76e20 in orte_plm_base_daemon_callback (num_daemons=2) at 
>> base/plm_base_launch_support.c:459
>> #6  0x77b7bed7 in plm_slurm_launch_job (jdata=0x610560) at 
>> plm_slurm_module.c:360
>> #7  0x00403f46 in orterun (argc=2, argv=0x7fffe7d8) at 
>> orterun.c:754
>> #8  0x00402fb4 in main (argc=2, argv=0x7fffe7d8) at main.c:13
>> (gdb) print pdatorted
>> $1 = (orte_proc_t **) 0x67c610
>> (gdb) print mev
>> $2 = (orte_message_event_t *) 0x681550
>> (gdb) print mev->sender.vpid
>> $3 = 4294967295
>> (gdb) print mev->sender
>> $4 = {jobid = 1721696256, vpid = 4294967295}
>> (gdb) print *mev
>> $5 = {super = {obj_magic_id = 16046253926196952813, obj_class = 
>> 0x77dd4f40, obj_reference_count = 1, cls_init_file_name = 0x77bb9a78 
>> "base/plm_base_launch_support.c", 
>>  cls_init_lineno = 423}, ev = 0x680850, sender = {jobid = 1721696256, vpid = 
>> 4294967295}, buffer = 0x6811b0, tag = 10, file = 0x680640 
>> "rml_oob_component.c", line = 279}
> 
> The jobid and vpid look like the defined INVALID values, indicating that 
> something is quite wrong. This would quite likely lead to the segfault.
> 
>> From this, it would indeed appear that you are getting some kind of library 
>> confusion - the most likely cause of such an error is a daemon from a 
>> different version trying to respond, and so the returned message isn't 
>> correct.
> 
> Not sure why else it would be happening...you could try setting -mca 
> plm_base_verbose 5 to get more debug output displayed on your screen, 
> assuming you built OMPI with --enable-debug.
> 

Found the problem It is a site configuration issue, which I'll need to find 
a workaround for.

[bio-ipc.{FQDN}:27523] mca:base:select:(  plm) Query of component [slurm] set 
priority to 75
[bio-ipc.{FQDN}:27523] mca:base:select:(  plm) Selected component [slurm]
[bio-ipc.{FQDN}:27523] mca: base: close: component rsh closed
[bio-ipc.{FQDN}:27523] mca: base: close: unloading component rsh
[bio-ipc.{FQDN}:27523] plm:base:set_hnp_name: initial bias 27523 nodename hash 
1936089714
[bio-ipc.{FQDN}:27523] plm:base:set_hnp_name: final jobfam 31383
[bio-ipc.{FQDN}:27523] [[31383,0],0] plm:base:receive start comm
[bio-ipc.{FQDN}:27523] [[31383,0],0] plm:slurm: launching job [31383,1]
[bio-ipc.{FQDN}:27523] [[31383,0],0] plm:base:setup_job for job [31383,1]
[bio-ipc.{FQDN}:27523] [[31383,0],0] plm:slurm: launching on nodes ipc3
[bio-ipc.{FQDN}:27523] [[31383,0],0] plm:slurm: final top-level argv:
srun --nodes=1 --ntasks=1 --kill-on-bad-exit --nodelist=ipc3 orted -mca 
ess slurm -mca orte_ess_jobid 2056716288 -mca orte_ess_vpid 1 -mca 
orte_ess_num_procs 2 --hnp-uri 
"2056716288.0;tcp://lanip:37493;tcp://globalip:37493;tcp://lanip2:37493" -mca 
plm_base_verbose 20

I then inserted some printf's into the ess_slurm_module (rough and ready, I 
know, but I was in a hurry).

Just after initialisation: (at around line 345)
orte_ess_slurm: jobid 2056716288 vpid 1
So it gets that...
I narrowed it down to the get_slurm_nodename  function, as the method didn't 
p

[OMPI users] RoCE (IBoE) & OpenMPI

2011-02-18 Thread Michael Shuey

I've been looking into OpenMPI's support for RoCE (Mellanox's recent
Infiniband-over-Ethernet) lately.  While it's promising, I've hit a
snag: RoCE requires lossless ethernet, and on my switches the only way
to guarantee this is with CoS.  RoCE adapters cannot emit CoS priority
tags unless the client program selects an IB service level and uses a
non-default GID.

There's a command-line option in OpenMPI to pick an IB SL, but I can't
find one for picking a different GID.  Does this exist for the openib
btl?  Or am I going about this the wrong way?

--
Mike Shuey

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-02-18 Thread Michael Shuey

It's a little different in RoCE.  There's no subnet manager, so (as
near as I can tell) you don't really have a subnet ID.  Instead, the
GID = GUID + VLAN tag (more or less).  gid[0] has special bits in the
VLAN tag section, to indicate that packets relating to this GID don't
get a VLAN tag.  Unfortunately, without a VLAN tag, those packets lack
priority bits - meaning they can't be matched to a lossless class on
our Cisco switches.

RoCE HCAs keep a GID table, like normal HCAs.  Every time you bring up
a vlan interface, another entry gets automatically added to the table.
 If I select one of these other GIDs, packets get a VLAN tag, and that
contains the necessary priority bits (well, assuming I selected the
right IB service level, which is mapped to the priority tag in the
VLAN header) for the traffic to match a lossless class of service on
the switch.

For this to work, I really need for the IB client to select a
non-default GID.  A few test programs included in OFED will do this,
but I'm not sure OpenMPI will.  Any thoughts?

--
Mike Shuey

On Fri, Feb 18, 2011 at 9:30 AM, Jeff Squyres  wrote:
> Greetings Mike.  I'll answer today because Fri-Sat is the weekend in Israel 
> (i.e., the MPI team at Mellanox won't see this until Sunday).
>
> I don't have a lot of experience with RoCE; do you need a different GUID or a 
> different subnet ID?  At least in IB, the GID = GUID + Subnet ID.  The GUID 
> should be your unique port ID and the subnet ID is, well, the subnet ID.  :-)
>
> Changing either of these in IB is an administrative function, not a 
> user-level function.  Meaning: I'm *guessing* that the same is true for RoCE 
> -- changing the subnet ID (which is what I'm further guessing you need to do) 
> should be somewhere in the root-level setup for RoCE.  Once you set a 
> different subnet ID, Open MPI should just use it.
>
>
> On Feb 18, 2011, at 8:17 AM, Michael Shuey wrote:
>
>> I've been looking into OpenMPI's support for RoCE (Mellanox's recent
>> Infiniband-over-Ethernet) lately.  While it's promising, I've hit a
>> snag: RoCE requires lossless ethernet, and on my switches the only way
>> to guarantee this is with CoS.  RoCE adapters cannot emit CoS priority
>> tags unless the client program selects an IB service level and uses a
>> non-default GID.
>>
>> There's a command-line option in OpenMPI to pick an IB SL, but I can't
>> find one for picking a different GID.  Does this exist for the openib
>> btl?  Or am I going about this the wrong way?
>>
>> --
>> Mike Shuey
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-02-18 Thread Michael Shuey

Per-node GID & SL settings == bad.  Site-wide GID & SL settings == good.

If this could be an MCA param (like btl_openib_ib_service_level)
that'd be great - we already have a global config file of similar
params.  We'd definitely want the same N everywhere.

--
Mike Shuey



On Fri, Feb 18, 2011 at 3:44 PM, Jeff Squyres  wrote:
> On Feb 18, 2011, at 1:39 PM, Michael Shuey wrote:
>
>> RoCE HCAs keep a GID table, like normal HCAs.  Every time you bring up
>> a vlan interface, another entry gets automatically added to the table.
>> If I select one of these other GIDs, packets get a VLAN tag, and that
>> contains the necessary priority bits (well, assuming I selected the
>> right IB service level, which is mapped to the priority tag in the
>> VLAN header) for the traffic to match a lossless class of service on
>> the switch.
>
> Ah -- I see it now (it's been a looong time since I've looked in Open MPI's 
> verbs code!).  We query and simply take the 0th GID from a given IBV device 
> port's GID table.
>
>> For this to work, I really need for the IB client to select a
>> non-default GID.  A few test programs included in OFED will do this,
>> but I'm not sure OpenMPI will.  Any thoughts?
>
> Yes, we can do this.  It's pretty easy to add an MCA parameter to select the 
> Nth GID rather than always taking the 0th.
>
> To make this simple, can you make it so that the value of N is the same 
> across all nodes in your cluster?  Then you can set a site-wide MCA param for 
> that value of N and be done with this issue.  If we have to have a per-node 
> setting of N, it could get a little hairy (it's do-able, but... it's a 
> heckuva lot easier if N is the same everywhere).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-02-22 Thread Michael Shuey

Could you re-enable the SL param (btl_openib_ib_service_level) for
RoCE?  Jeff was kind enough to provide a patch to let me specify the
gid_index, but that doesn't seem to be working.  To get RoCE to work
correctly (at least, on Nexus switches) I'll need to specify both a
gid_index and an IB service level.  I think. :-)

Also, while the rdmacm connection manager is required for RoCE, it's
not selected by default (like it is for iWARP).  You still need to add
that to a config file or command line, or you get a rather cryptic
option (at least up through OpenMPI 1.5.1).

--
Mike Shuey



On Mon, Feb 21, 2011 at 12:34 PM, Jeff Squyres  wrote:
> Random thought: is there a check to ensure that the SL MCA param is not set 
> in a RoCE environment?  If not, we should probably add a show_help warning if 
> the SL MCA param is set when using RoCE (i.e., that its value will be 
> ignored).
>
>
> On Feb 19, 2011, at 12:22 AM, Shamis, Pavel wrote:
>
>> As far as I remember we don't allow to user to specify SL for RoCE. RoCE 
>> considered kinda ethernet device and RDMACM connection manager is used to 
>> setup the connections. it means that in order to select network X  or Y, you 
>> may use ip/netmask (btl_openib_ipaddr_include) .
>>
>> Pavel (Pasha) Shamis
>> ---
>> Application Performance Tools Group
>> Computer Science and Math Division
>> Oak Ridge National Laboratory
>>
>>
>>
>>
>>
>>
>> On Feb 18, 2011, at 4:14 PM, Michael Shuey wrote:
>>
>>> Per-node GID & SL settings == bad.  Site-wide GID & SL settings == good.
>>>
>>> If this could be an MCA param (like btl_openib_ib_service_level)
>>> that'd be great - we already have a global config file of similar
>>> params.  We'd definitely want the same N everywhere.
>>>
>>> --
>>> Mike Shuey
>>>
>>>
>>>
>>> On Fri, Feb 18, 2011 at 3:44 PM, Jeff Squyres  wrote:
>>>> On Feb 18, 2011, at 1:39 PM, Michael Shuey wrote:
>>>>
>>>>> RoCE HCAs keep a GID table, like normal HCAs.  Every time you bring up
>>>>> a vlan interface, another entry gets automatically added to the table.
>>>>> If I select one of these other GIDs, packets get a VLAN tag, and that
>>>>> contains the necessary priority bits (well, assuming I selected the
>>>>> right IB service level, which is mapped to the priority tag in the
>>>>> VLAN header) for the traffic to match a lossless class of service on
>>>>> the switch.
>>>>
>>>> Ah -- I see it now (it's been a looong time since I've looked in Open 
>>>> MPI's verbs code!).  We query and simply take the 0th GID from a given IBV 
>>>> device port's GID table.
>>>>
>>>>> For this to work, I really need for the IB client to select a
>>>>> non-default GID.  A few test programs included in OFED will do this,
>>>>> but I'm not sure OpenMPI will.  Any thoughts?
>>>>
>>>> Yes, we can do this.  It's pretty easy to add an MCA parameter to select 
>>>> the Nth GID rather than always taking the 0th.
>>>>
>>>> To make this simple, can you make it so that the value of N is the same 
>>>> across all nodes in your cluster?  Then you can set a site-wide MCA param 
>>>> for that value of N and be done with this issue.  If we have to have a 
>>>> per-node setting of N, it could get a little hairy (it's do-able, but... 
>>>> it's a heckuva lot easier if N is the same everywhere).
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-02-24 Thread Michael Shuey

Late yesterday I did have a chance to test the patch Jeff provided
(against 1.4.3 - testing 1.5.x is on the docket for today).  While it
works, in that I can specify a gid_index, it doesn't do everything
required - my traffic won't match a lossless CoS on the ethernet
switch.  Specifying a GID is only half of it; I really need to also
specify a service level.

The bottom 3 bits of the IB SL are mapped to ethernet's PCP bits in
the VLAN tag.  With a non-default gid, I can select an available VLAN
(so RoCE's packets will include the PCP bits), but the only way to
specify a priority is to use an SL.  So far, the only RoCE-enabled app
I've been able to make work correctly (such that traffic matches a
lossless CoS on the switch) is ibv_rc_pingpong - and then, I need to
use both a specific GID and a specific SL.

The slides Pavel found seem a little misleading to me.  The VLAN isn't
determined by bound netdev; all VLAN netdevs map to the same IB
adapter for RoCE.  VLAN is determined by gid index.  Also, the SL
isn't determined by a set kernel policy; it's provided via the IB
interfaces.  As near as I can tell from Mellanox's documentation, OFED
test apps, and the driver source, a RoCE adapter is an Infiniband card
in almost all respects (even more so than an iWARP adapter).

--
Mike Shuey

On Wed, Feb 23, 2011 at 5:03 PM, Jeff Squyres  wrote:
> On Feb 23, 2011, at 3:54 PM, Shamis, Pavel wrote:
>
>> I remember that I updated the trunk to select by default RDMACM connection 
>> manager for RoCE ports - https://svn.open-mpi.org/trac/ompi/changeset/22311
>>
>> I'm not sure it the change made his way to any production version. I don't 
>> work on this part code anymore :-)
>
> Mellanox -- can you follow up on this?
>
> Also, in addition to the patches I provided for selecting an arbitrary GID (I 
> was planning on committing them when Mike tested them at Purdue, but perhaps 
> I should just commit to the trunk anyway), perhaps we should check if a 
> non-default SL is supplied via MCA param in the RoCE case and output an 
> orte_show_help to warn that it will have no effect (i.e., principle of least 
> surprise and all that).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-03-01 Thread Michael Shuey

So, since RoCE has no SM, and setting an SL is required to get
lossless ethernet on Cisco switches (and possibly others), does this
mean that RoCE will never work correctly with OpenMPI on Cisco
hardware?

--
Mike Shuey



On Tue, Mar 1, 2011 at 3:42 AM, Doron Shoham  wrote:
> Hi,
>
> Regarding to using a specific SL with RDMA CM, I've checked in the code and 
> it seems that RDMA_CM uses the SL from the SA.
> So if you want to configure a specific SL, you need to do it via the SM.
>
> Doron
>
> -Original Message-
> From: Jeff Squyres [mailto:jsquy...@cisco.com]
> Sent: Thursday, February 24, 2011 3:45 PM
> To: Michael Shuey
> Cc: Open MPI Users , Mike Dubman
> Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI
>
> On Feb 24, 2011, at 8:00 AM, Michael Shuey wrote:
>
>> Late yesterday I did have a chance to test the patch Jeff provided
>> (against 1.4.3 - testing 1.5.x is on the docket for today).  While it
>> works, in that I can specify a gid_index,
>
> Great!  I'll commit that to the trunk and start the process of moving it to 
> the v1.5.x series (I know you haven't tested it yet, but it's essentially the 
> same patch, just slightly adjusted for each of the 3 branches).
>
>> it doesn't do everything
>> required - my traffic won't match a lossless CoS on the ethernet
>> switch.  Specifying a GID is only half of it; I really need to also
>> specify a service level.
>
> RoCE requires the use of the RDMA CM (I think?), and I didn't think there was 
> a way to request a specific SL via the RDMA CM...?  (I could certainly be 
> wrong here)
>
> I think Mellanox will need to follow up with these questions...
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-03-02 Thread Michael Shuey

Honestly, I don't know - I haven't looked into the source.  OFED 1.5.2
has a version of ibv_rc_pingpong that's been modified to work with
RoCE; you can pass the gid_index and SL as command-line arguments.
I'm not sure how that's handled at the IB layer, but the source may be
a good place to start.

--
Mike Shuey



On Tue, Mar 1, 2011 at 9:14 AM, Jeff Squyres  wrote:
> I thought you mentioned in a prior email that you had gotten one or two other 
> OFED sample applications to work properly.  How are they setting the SL?  Are 
> they not using the RDMA CM?
>
>
> On Mar 1, 2011, at 7:35 AM, Michael Shuey wrote:
>
>> So, since RoCE has no SM, and setting an SL is required to get
>> lossless ethernet on Cisco switches (and possibly others), does this
>> mean that RoCE will never work correctly with OpenMPI on Cisco
>> hardware?
>>
>> --
>> Mike Shuey
>>
>>
>>
>> On Tue, Mar 1, 2011 at 3:42 AM, Doron Shoham  wrote:
>>> Hi,
>>>
>>> Regarding to using a specific SL with RDMA CM, I've checked in the code and 
>>> it seems that RDMA_CM uses the SL from the SA.
>>> So if you want to configure a specific SL, you need to do it via the SM.
>>>
>>> Doron
>>>
>>> -Original Message-
>>> From: Jeff Squyres [mailto:jsquy...@cisco.com]
>>> Sent: Thursday, February 24, 2011 3:45 PM
>>> To: Michael Shuey
>>> Cc: Open MPI Users , Mike Dubman
>>> Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI
>>>
>>> On Feb 24, 2011, at 8:00 AM, Michael Shuey wrote:
>>>
>>>> Late yesterday I did have a chance to test the patch Jeff provided
>>>> (against 1.4.3 - testing 1.5.x is on the docket for today).  While it
>>>> works, in that I can specify a gid_index,
>>>
>>> Great!  I'll commit that to the trunk and start the process of moving it to 
>>> the v1.5.x series (I know you haven't tested it yet, but it's essentially 
>>> the same patch, just slightly adjusted for each of the 3 branches).
>>>
>>>> it doesn't do everything
>>>> required - my traffic won't match a lossless CoS on the ethernet
>>>> switch.  Specifying a GID is only half of it; I really need to also
>>>> specify a service level.
>>>
>>> RoCE requires the use of the RDMA CM (I think?), and I didn't think there 
>>> was a way to request a specific SL via the RDMA CM...?  (I could certainly 
>>> be wrong here)
>>>
>>> I think Mellanox will need to follow up with these questions...
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] RDMACM Differences

2011-03-03 Thread Michael Shuey

Alternatively, if OpenMPI is really trying to use both ports, you
could force it to use just one port with --mca btl_openib_if_include
mlx4_0:1 (probably)

--
Mike Shuey



On Tue, Mar 1, 2011 at 1:02 PM, Jeff Squyres  wrote:
> On Feb 28, 2011, at 12:49 PM, Jagga Soorma wrote:
>
>> -bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_
>> prefix 0 -np 2 --hostfile mpihosts 
>> /home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency
>
> Your use of btl_openib_warn_default_gid_prefix may have brought up a subtle 
> issue in Open MPI's verbs support.  More below.
>
>> # OSU MPI Latency Test v3.3
>> # Size            Latency (us)
>> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] 
>> error modifing QP to RTR errno says Invalid argument
>> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb] 
>> error in endpoint reply start connect
>
> Looking at this error message and your ibv_devinfo output:
>
>> [root@amber03 ~]# ibv_devinfo
>> hca_id:    mlx4_0
>>     transport:            InfiniBand (0)
>>     fw_ver:                2.7.9294
>>     node_guid:            78e7:d103:0021:8884
>>     sys_image_guid:            78e7:d103:0021:8887
>>     vendor_id:            0x02c9
>>     vendor_part_id:            26438
>>     hw_ver:                0xB0
>>     board_id:            HP_020003
>>     phys_port_cnt:            2
>>         port:    1
>>             state:            PORT_ACTIVE (4)
>>             max_mtu:        2048 (4)
>>             active_mtu:        2048 (4)
>>             sm_lid:            1
>>             port_lid:        20
>>             port_lmc:        0x00
>>             link_layer:        IB
>>
>>         port:    2
>>             state:            PORT_ACTIVE (4)
>>             max_mtu:        2048 (4)
>>             active_mtu:        1024 (3)
>>             sm_lid:            0
>>             port_lid:        0
>>             port_lmc:        0x00
>>             link_layer:        Ethernet
>
> It looks like you have 1 HCA port as IB and the other at Ethernet.
>
> I'm wondering if OMPI is not taking the device transport into account and is 
> *only* using the subnet ID to determine reachability (i.e., I'm wondering if 
> we didn't anticipate multiple devices/ports with the same subnet ID but with 
> different transports).  I pointed this out to Mellanox yesterday; I think 
> they're following up on it.
>
> In the meantime, a workaround might be to set a non-default subnet ID on your 
> IB network.  That should allow Open MPI to tell these networks apart without 
> additional help.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

[OMPI users] Conflicting versions of libgfortran.so with mpif90?

2011-06-14 Thread Michael Cugley

I do IT support for people who are using OpenMPI for research.  However, 
they are reporting the following warnings when compiling code with mpif90:


/usr/bin/ld: warning: libgfortran.so.1, needed by 
/usr/lib64/openmpi/1.4-gcc/lib/libmpi_f90.so, may conflict with 
libgfortran.so.3


Running ldd on the resulting executable gives:

libmpi_f90.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libmpi_f90.so.0 
(0x2b5aac251000)
libmpi_f77.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libmpi_f77.so.0 
(0x2b5aac454000)
libmpi.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libmpi.so.0 
(0x003df360)
libopen-rte.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libopen-rte.so.0 
(0x003df1a0)
libopen-pal.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libopen-pal.so.0 
(0x003df1e0)

libdl.so.2 => /lib64/libdl.so.2 (0x003df2e0)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003df220)
libutil.so.1 => /lib64/libutil.so.1 (0x003dff40)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x2b5aac6a3000)
libm.so.6 => /lib64/libm.so.6 (0x003df2a0)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x003e02c0)
libpthread.so.0 => /lib64/libpthread.so.0 (0x003df320)
libc.so.6 => /lib64/libc.so.6 (0x003df260)
libgfortran.so.1 => /usr/lib64/libgfortran.so.1 (0x2b5aac999000)
/lib64/ld-linux-x86-64.so.2 (0x003df160)

It looks like there are attempts to link to two versions of libgfortran, 
which aren't compatible.


I'm not familiar with OpenMPI myself, but the people using it would like 
to know how these warnings can be dealt with.


--
Michael Cugley
School of Engineering IT Support
m.cug...@eng.gla.ac.uk

Please direct IT support queries to itsupp...@eng.gla.ac.uk

Re: [OMPI users] Conflicting versions of libgfortran.so with mpif90? Solved!

2011-06-16 Thread Michael Cugley


On June 14, 2011 at 12:35 PM, Jeff Squyres wrote:


Are they using a different version of gfortran to compile / link their 
application than what was used to compile / build Open MPI?


FWIW: it's typically easier to use the same compilers to build Open 
MPI as the application.





That did, in fact, turn out to be the problem.  After some head 
scratching and the mighty Google, I found this page: 
https://www.scotgrid.ac.uk/wiki/index.php/Building_OPENMPI which 
(amongst other things) gave me the requisite flags to feed configure.


Oddly, ldd on the resulting executable still shows references to 
libgfortran.so.3 and .so.1, but the warnings are gone and the user is 
happy, so I'm counting it as a victory.


--
Michael Cugley
School of Engineering IT Support
m.cug...@eng.gla.ac.uk

Please direct IT support queries to itsupp...@eng.gla.ac.uk

[OMPI users] btl_openib_ipaddr_include broken in 1.4.4rc2?

2011-08-30 Thread Michael Shuey

I'm using RoCE (or rather, attempting to) and need to select a
non-default GID to get my traffic properly classified.  Both 1.4.4rc2
and 1.5.4 support the btl_openib_ipaddr_include option, but only 1.5.4
causes my traffic to use the proper GID and VLAN.

Is there something broken with ipaddr_include in 1.4.4rc2?

--
Mike Shuey

Re: [OMPI users] Directed to Undirected Graph

2012-06-05 Thread Michael Raymond

  You need to use 2 calls. One option is an Allgather followed by an 
Allgatherv.


Allgather() with one integer, which is the number of nodes the rank is 
linked to
Allgatherv() with a variable size array of integers where each entry is 
a connected to node



On 06/05/2012 08:39 AM, Mudassar Majeed wrote:


Dear people,
Let say there are N MPI processes. Each MPI process has to communicate
with some T processes, where T < N. This information is a directed graph
(and every process knows only about its own). I need to convert it to
undirected graph, so that each process will inform T other processes
about it. Every process will update this information. (that may be
stored in an array of maximum size N). What can be the best way to
exchange this information among all MPI processes ? MPI_AllGather and
MPI_AllGatherv do not solve my problem.


best regards,


-- Mudassar



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Michael A. Raymond
SGI MPT Team Leader
(651) 683-3434

[OMPI users] mpivars.sh - Intel Fortran 13.1 conflict with OpenMPI 1.6.3

2013-01-24 Thread Michael Kluskens

This is for reference and suggestions as this took me several hours to track 
down and the previous discussion on "mpivars.sh" failed to cover this point 
(nothing in the FAQ):

I successfully build and installed OpenMPI 1.6.3 using the following on Debian 
Linux:

./configure --prefix=/opt/openmpi/intel131 --disable-ipv6 
--with-mpi-f90-size=medium --with-f90-max-array-dim=4 --disable-vt 
F77=/opt/intel/composer_xe_2013.1.117/bin/intel64/ifort FC=/opt/
intel/composer_xe_2013.1.117/bin/intel64/ifort CXXFLAGS=-m64 CFLAGS=-m64 CC=gcc 
CXX=g++

(disable-vt was required because of an error finding -lz which I gave up on).

My .tcshrc file HAD the following:

set path = (/opt/openmpi/intel131/bin $path)
setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH
setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH
alias mpirun "mpirun --prefix /opt/openmpi/intel131 "
source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64

For years I have used these procedures on Debian Linux and OS X with earlier 
versions of OpenMPI and Intel Fortran.

However, at some point Intel Fortran started including "mpirt", including: 
/opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun

So even through I have the alias set for mpirun, I got the following error:

> mpirun -V
.: 131: Can't open 
/opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh

Part of the confusion is that OpenMPI source does include a reference to 
"mpivars" in "contrib/dist/linux/openmpi.spec"

The solution only occurred as I was writing this up, source intel setup first:

source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64
set path = (/opt/openmpi/intel131/bin $path)
setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH
setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH
alias mpirun "mpirun --prefix /opt/openmpi/intel131 "

Now I finally get:

> mpirun -V
mpirun (Open MPI) 1.6.3

The mpi runtime should be in the redistributable for their MPI compiler not in 
the base compiler.  The question is how much of 
/opt/intel/composer_xe_2013.1.117/mpirt can I eliminate safely and should I ( 
multi-user machine were each user has their own Intel license, so I don't wish 
to trouble shoot this in the future ) ?

Re: [OMPI users] mpirun error

2013-04-01 Thread Michael Kluskens

The Intel Fortran 2013 compiler comes with support for Intel's MPI runtime and 
you are getting that instead of OpenMPI.   You need to fix your path for all 
the shells you use.

On Apr 1, 2013, at 5:12 AM, Pradeep Jha wrote:

> /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun: line 96: 
> /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh: No such file 
> or directory

[OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics

2013-07-05 Thread Michael Thomadakis

Hello OpenMPI

We area seriously considering deploying OpenMPI 1.6.5 for production (and
1.7.2 for testing) on HPC clusters which consists of nodes with *different
types of networking interfaces*.


1) Interface selection

We are using OpenMPI 1.6.5 and was wondering how one would go about
selecting* at run time* which networking interface to use for MPI
communications in case that both IB, 10GigE and 1 GigE are present.

This issues arises in a cluster with nodes that are equipped with different
types of interfaces:

*Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have
10-GigE and 1-GigE and simply others only 1-GigE.


2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric

Can OpenMPI support running an MPI application using a mix of nodes with
all of the above networking interface combinations ?

  2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run
on nodes with QDR IB and another subset on FDR IB simultaneously? These are
Mellanox QDR and FDR HCAs.

Mellanox mentioned to us that they support both QDR and FDR HCAs attached
to the same IB subnet. Do you think MVAPICH2 will have any issue with this?

2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run on
nodes with IB and another subset over 10GiGE simultaneously?

That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1, G2,
GM having only 10GigE interfaces. Could we have the same MPI application
run across both types of nodes?

Or should there be say 2 communicators with one of them explicitly overlaid
on a IB only subnet and the other on a 10GigE only subnet?


Please let me know if the above are not very clear.

Thank you much

Re: [OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics

2013-07-05 Thread Michael Thomadakis

Sorry on the mvapich2 reference :)

All nodes are attached over a common 1GigE network. We wish ofcourse that
if a node-pair is connected via a higher-speed fabric *as well* (IB FDR or
10GigE) then that this would be leveraged instead of the common 1GigE.

One question: suppose that we use nodes having either FDR or QDR IB
interfaces available, connected to one common IB fabric, all defined over a
common IP subnet: Will OpenMPI have any problem with this? Can MPI
communication take place over this type of hybrid IB fabric? We already
have a sub-cluster with QDR HCAs and we are attaching it to IB fabric with
FDR "backbone" and another cluster with FDR HCAs.

Do you think there may be some issue with this? The HCAs are FDR and QDR
Mellanox devices and the switching is also over FDR Mellanox fabric.
Mellanox claims that at the IB level this is doable (i.e., FDR link pairs
talk to each other at FDR speeds and QDR link pairs at QDR).

I guess if we use the RC connection types then it does not matter to
OpenMPI.

thanks ....
Michael




On Fri, Jul 5, 2013 at 4:59 PM, Ralph Castain  wrote:

> I can't speak for MVAPICH - you probably need to ask them about this
> scenario. OMPI will automatically select whatever available transport that
> can reach the intended process. This requires that each communicating pair
> of processes have access to at least one common transport.
>
> So if a process that is on a node with only 1G-E wants to communicate with
> another process, then the node where that other process is running must
> also have access to a compatible Ethernet interface (1G can talk to 10G, so
> they can have different capabilities) on that subnet (or on a subnet that
> knows how to route to the other one). If both nodes have 10G-E as well as
> 1G-E interfaces, then OMPI will automatically take the 10G interface as it
> is the faster of the two.
>
> Note this means that if a process is on a node that only has IB, and wants
> to communicate to a process on a node that only has 1G-E, then the two
> processes cannot communicate.
>
> HTH
> Ralph
>
> On Jul 5, 2013, at 2:34 PM, Michael Thomadakis 
> wrote:
>
> Hello OpenMPI
>
> We area seriously considering deploying OpenMPI 1.6.5 for production (and
> 1.7.2 for testing) on HPC clusters which consists of nodes with *different
> types of networking interfaces*.
>
>
> 1) Interface selection
>
> We are using OpenMPI 1.6.5 and was wondering how one would go about
> selecting* at run time* which networking interface to use for MPI
> communications in case that both IB, 10GigE and 1 GigE are present.
>
> This issues arises in a cluster with nodes that are equipped with
> different types of interfaces:
>
> *Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have
> 10-GigE and 1-GigE and simply others only 1-GigE.
>
>
> 2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric
>
> Can OpenMPI support running an MPI application using a mix of nodes with
> all of the above networking interface combinations ?
>
>   2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run
> on nodes with QDR IB and another subset on FDR IB simultaneously? These are
> Mellanox QDR and FDR HCAs.
>
> Mellanox mentioned to us that they support both QDR and FDR HCAs attached
> to the same IB subnet. Do you think MVAPICH2 will have any issue with this?
>
> 2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run
> on nodes with IB and another subset over 10GiGE simultaneously?
>
> That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1,
> G2, GM having only 10GigE interfaces. Could we have the same MPI
> application run across both types of nodes?
>
> Or should there be say 2 communicators with one of them explicitly
> overlaid on a IB only subnet and the other on a 10GigE only subnet?
>
>
> Please let me know if the above are not very clear.
>
> Thank you much
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics

2013-07-05 Thread Michael Thomadakis

Great ... thanks.   We will try it out as soon as the common backbone IB is
in place.

cheers
Michael



On Fri, Jul 5, 2013 at 6:10 PM, Ralph Castain  wrote:

> As long as the IB interfaces can communicate to each other, you should be
> fine.
>
> On Jul 5, 2013, at 3:26 PM, Michael Thomadakis 
> wrote:
>
> Sorry on the mvapich2 reference :)
>
> All nodes are attached over a common 1GigE network. We wish ofcourse that
> if a node-pair is connected via a higher-speed fabric *as well* (IB FDR
> or 10GigE) then that this would be leveraged instead of the common 1GigE.
>
> One question: suppose that we use nodes having either FDR or QDR IB
> interfaces available, connected to one common IB fabric, all defined over a
> common IP subnet: Will OpenMPI have any problem with this? Can MPI
> communication take place over this type of hybrid IB fabric? We already
> have a sub-cluster with QDR HCAs and we are attaching it to IB fabric with
> FDR "backbone" and another cluster with FDR HCAs.
>
> Do you think there may be some issue with this? The HCAs are FDR and QDR
> Mellanox devices and the switching is also over FDR Mellanox fabric.
> Mellanox claims that at the IB level this is doable (i.e., FDR link pairs
> talk to each other at FDR speeds and QDR link pairs at QDR).
>
> I guess if we use the RC connection types then it does not matter to
> OpenMPI.
>
> thanks 
> Michael
>
>
>
>
> On Fri, Jul 5, 2013 at 4:59 PM, Ralph Castain  wrote:
>
>> I can't speak for MVAPICH - you probably need to ask them about this
>> scenario. OMPI will automatically select whatever available transport that
>> can reach the intended process. This requires that each communicating pair
>> of processes have access to at least one common transport.
>>
>> So if a process that is on a node with only 1G-E wants to communicate
>> with another process, then the node where that other process is running
>> must also have access to a compatible Ethernet interface (1G can talk to
>> 10G, so they can have different capabilities) on that subnet (or on a
>> subnet that knows how to route to the other one). If both nodes have 10G-E
>> as well as 1G-E interfaces, then OMPI will automatically take the 10G
>> interface as it is the faster of the two.
>>
>> Note this means that if a process is on a node that only has IB, and
>> wants to communicate to a process on a node that only has 1G-E, then the
>> two processes cannot communicate.
>>
>> HTH
>> Ralph
>>
>> On Jul 5, 2013, at 2:34 PM, Michael Thomadakis 
>> wrote:
>>
>> Hello OpenMPI
>>
>> We area seriously considering deploying OpenMPI 1.6.5 for production (and
>> 1.7.2 for testing) on HPC clusters which consists of nodes with *different
>> types of networking interfaces*.
>>
>>
>> 1) Interface selection
>>
>> We are using OpenMPI 1.6.5 and was wondering how one would go about
>> selecting* at run time* which networking interface to use for MPI
>> communications in case that both IB, 10GigE and 1 GigE are present.
>>
>> This issues arises in a cluster with nodes that are equipped with
>> different types of interfaces:
>>
>> *Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have
>> 10-GigE and 1-GigE and simply others only 1-GigE.
>>
>>
>> 2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric
>>
>> Can OpenMPI support running an MPI application using a mix of nodes with
>> all of the above networking interface combinations ?
>>
>>   2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks
>> run on nodes with QDR IB and another subset on FDR IB simultaneously? These
>> are Mellanox QDR and FDR HCAs.
>>
>> Mellanox mentioned to us that they support both QDR and FDR HCAs attached
>> to the same IB subnet. Do you think MVAPICH2 will have any issue with this?
>>
>> 2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run
>> on nodes with IB and another subset over 10GiGE simultaneously?
>>
>> That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1,
>> G2, GM having only 10GigE interfaces. Could we have the same MPI
>> application run across both types of nodes?
>>
>> Or should there be say 2 communicators with one of them explicitly
>> overlaid on a IB only subnet and the other on a 10GigE only subnet?
>>
>>
>> Please let me know if the above are not very clear.
>>
>> Thank you much
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

[OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-06 Thread Michael Thomadakis

Hello OpenMPI,

I am wondering what level of support is there for CUDA and GPUdirect on
OpenMPI 1.6.5 and 1.7.2.

I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it
seems that with configure v1.6.5 it was ignored.

Can you identify GPU memory and send messages from it directly without
copying to host memory first?


Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do
you support SDK 5.0 and above?

Cheers ...
Michael

[OMPI users] Question on handling of memory for communications

2013-07-06 Thread Michael Thomadakis

Hello OpenMPI,

When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 *gen
3*do you pay any special attention to the memory buffers according to
which
socket/memory controller  their physical memory belongs to?

For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do
you do anything special when the read/write buffers map to physical memory
belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory
that belongs (is accessible via) the other socket?

Has this situation improved with Ivy-Brige systems or Haswell?

Cheers
Michael

Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-06 Thread Michael Thomadakis

thanks,

Do you guys have any plan to support Intel Phi in the future? That is,
running MPI code on the Phi cards or across the multicore and Phi, as Intel
MPI does?

thanks...
Michael


On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:

> Rolf will have to answer the question on level of support. The CUDA code
> is not in the 1.6 series as it was developed after that series went
> "stable". It is in the 1.7 series, although the level of support will
> likely be incrementally increasing as that "feature" series continues to
> evolve.
>
>
> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis 
> wrote:
>
> > Hello OpenMPI,
> >
> > I am wondering what level of support is there for CUDA and GPUdirect on
> OpenMPI 1.6.5 and 1.7.2.
> >
> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However,
> it seems that with configure v1.6.5 it was ignored.
> >
> > Can you identify GPU memory and send messages from it directly without
> copying to host memory first?
> >
> >
> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ?
> Do you support SDK 5.0 and above?
> >
> > Cheers ...
> > Michael
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis

Hi Jeff,

thanks for the reply.

The issue is that when you read or write PCIe_gen 3 dat to a non-local NUMA
memory, SandyBridge will use the inter-socket QPIs to get this data across
to the other socket. I think there is considerable limitation in PCIe I/O
traffic data going over the inter-socket QPI. One way to get around this is
for reads to buffer all data into memory space local to the same socket and
then transfer them by code across to the other socket's physical memory.
For writes the same approach can be used with intermediary process copying
data.

I was wondering if OpenMPI does anything special memory mapping to work
around this. And if with Ivy Bridge (or Haswell) he situation has improved.

thanks
Mike


On Mon, Jul 8, 2013 at 9:57 AM, Jeff Squyres (jsquyres)
wrote:

> On Jul 6, 2013, at 4:59 PM, Michael Thomadakis 
> wrote:
>
> > When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 gen 3
> do you pay any special attention to the memory buffers according to which
> socket/memory controller  their physical memory belongs to?
> >
> > For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do
> you do anything special when the read/write buffers map to physical memory
> belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory
> that belongs (is accessible via) the other socket?
>
> It is not *necessary* to do ensure that buffers are NUMA-local to the PCI
> device that they are writing to, but it certainly results in lower latency
> to read/write to PCI devices (regardless of flavor) that are attached to an
> MPI process' local NUMA node.  The Hardware Locality (hwloc) tool "lstopo"
> can print a pretty picture of your server to show you where your PCI busses
> are connected.
>
> For TCP, Open MPI will use all TCP devices that it finds by default
> (because it is assumed that latency is so high that NUMA locality doesn't
> matter).  The openib (OpenFabrics) transport will use the "closest" HCA
> ports that it can find to each MPI process.
>
> In our upcoming Cisco ultra low latency BTL, it defaults to using the
> closest Cisco VIC ports that it can find for short messages (i.e., to
> minimize latency), but uses all available VICs for long messages (i.e., to
> maximize bandwidth).
>
> > Has this situation improved with Ivy-Brige systems or Haswell?
>
> It's the same overall architecture (i.e., NUMA).
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-08 Thread Michael Thomadakis

Thanks ...
Michael


On Mon, Jul 8, 2013 at 8:50 AM, Rolf vandeVaart wrote:

> With respect to the CUDA-aware support, Ralph is correct.  The ability to
> send and receive GPU buffers is in the Open MPI 1.7 series.  And
> incremental improvements will be added to the Open MPI 1.7 series.  CUDA
> 5.0 is supported.
>
> ** **
>
> ** **
>
> ** **
>
> *From:* users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On
> Behalf Of *Ralph Castain
> *Sent:* Saturday, July 06, 2013 5:14 PM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI
> 1.6.5 an 1.7.2
>
> ** **
>
> There was discussion of this on a prior email thread on the OMPI devel
> mailing list:
>
> ** **
>
> http://www.open-mpi.org/community/lists/devel/2013/05/12354.php
>
> ** **
>
> ** **
>
> On Jul 6, 2013, at 2:01 PM, Michael Thomadakis 
> wrote:
>
>
>
> 
>
> thanks,
>
> Do you guys have any plan to support Intel Phi in the future? That is,
> running MPI code on the Phi cards or across the multicore and Phi, as Intel
> MPI does?
>
> thanks...
>
> Michael
>
> ** **
>
> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:***
> *
>
> Rolf will have to answer the question on level of support. The CUDA code
> is not in the 1.6 series as it was developed after that series went
> "stable". It is in the 1.7 series, although the level of support will
> likely be incrementally increasing as that "feature" series continues to
> evolve.
>
>
>
> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis 
> wrote:
>
> > Hello OpenMPI,
> >
> > I am wondering what level of support is there for CUDA and GPUdirect on
> OpenMPI 1.6.5 and 1.7.2.
> >
> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However,
> it seems that with configure v1.6.5 it was ignored.
> >
> > Can you identify GPU memory and send messages from it directly without
> copying to host memory first?
> >
> >
> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ?
> Do you support SDK 5.0 and above?
> >
> > Cheers ...
> > Michael
>
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ** **
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ** **
>  --
>  This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>  --
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis

People have mentioned that they experience unexpected slow downs in
PCIe_gen3 I/O when the pages map to a socket different from the one the HCA
connects to. It is speculated that the inter-socket QPI is not provisioned
to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may
not be in effect on all SandyBrige or IvyBridge systems.

Have you measured anything like this on you systems as well? That would
require using physical memory mapped to the socket w/o HCA exclusively for
MPI messaging.

Mike


On Mon, Jul 8, 2013 at 10:52 AM, Jeff Squyres (jsquyres)  wrote:

> On Jul 8, 2013, at 11:35 AM, Michael Thomadakis 
> wrote:
>
> > The issue is that when you read or write PCIe_gen 3 dat to a non-local
> NUMA memory, SandyBridge will use the inter-socket QPIs to get this data
> across to the other socket. I think there is considerable limitation in
> PCIe I/O traffic data going over the inter-socket QPI. One way to get
> around this is for reads to buffer all data into memory space local to the
> same socket and then transfer them by code across to the other socket's
> physical memory. For writes the same approach can be used with intermediary
> process copying data.
>
> Sure, you'll cause congestion across the QPI network when you do non-local
> PCI reads/writes.  That's a given.
>
> But I'm not aware of a hardware limitation on PCI-requested traffic across
> QPI (I could be wrong, of course -- I'm a software guy, not a hardware
> guy).  A simple test would be to bind an MPI process to a far NUMA node and
> run a simple MPI bandwidth test and see if to get better/same/worse
> bandwidth compared to binding an MPI process on a near NUMA socket.
>
> But in terms of doing intermediate (pipelined) reads/writes to local NUMA
> memory before reading/writing to PCI, no, Open MPI does not do this.
>  Unless there is a PCI-QPI bandwidth constraint that we're unaware of, I'm
> not sure why you would do this -- it would likely add considerable
> complexity to the code and it would definitely lead to higher overall MPI
> latency.
>
> Don't forget that the MPI paradigm is for the application to provide the
> send/receive buffer.  Meaning: MPI doesn't (always) control where the
> buffer is located (particularly for large messages).
>
> > I was wondering if OpenMPI does anything special memory mapping to work
> around this.
>
> Just what I mentioned in the prior email.
>
> > And if with Ivy Bridge (or Haswell) he situation has improved.
>
> Open MPI doesn't treat these chips any different.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-08 Thread Michael Thomadakis

Thanks Tom, that sounds good. I will give it a try as soon as our Phi host
here host gets installed.

I assume that all the prerequisite libs and bins on the Phi side are
available when we download the Phi s/w stack from Intel's site, right ?

Cheers
Michael




On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom  wrote:

>Do you guys have any plan to support Intel Phi in the future? That is,
> running MPI code on the Phi cards or across the multicore and Phi, as Intel
> MPI does?
>
> *[Tom] *
>
> Hi Michael,
>
> Because a Xeon Phi card acts a lot like a Linux host with an x86
> architecture, you can build your own Open MPI libraries to serve this
> purpose.
>
> Our team has used existing (an older 1.4.3 version of) Open MPI source to
> build an Open MPI for running MPI code on Intel Xeon Phi cards over Intel’s
> (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite well.
> We have not released a pre-built Open MPI as part of any Intel software
> release.   But I think if you have a compiler for Xeon Phi (Intel Compiler
> or GCC) and an interconnect for it, you should be able to build an Open MPI
> that works on Xeon Phi.  
>
> Cheers,
> Tom Elken
>
> thanks...
>
> Michael
>
> ** **
>
> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:***
> *
>
> Rolf will have to answer the question on level of support. The CUDA code
> is not in the 1.6 series as it was developed after that series went
> "stable". It is in the 1.7 series, although the level of support will
> likely be incrementally increasing as that "feature" series continues to
> evolve.
>
>
>
> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis 
> wrote:
>
> > Hello OpenMPI,
> >
> > I am wondering what level of support is there for CUDA and GPUdirect on
> OpenMPI 1.6.5 and 1.7.2.
> >
> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However,
> it seems that with configure v1.6.5 it was ignored.
> >
> > Can you identify GPU memory and send messages from it directly without
> copying to host memory first?
> >
> >
> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ?
> Do you support SDK 5.0 and above?
> >
> > Cheers ...
> > Michael
>
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ** **
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis

Hi Brice,

thanks for testing this out.

How did you make sure that the pinned pages used by the I/O adapter mapped
to the "other" socket's memory controller ? Is pining the MPI binary to a
socket sufficient to pin the space used for MPI I/O as well to that socket?
I think this is something done by and at the HCA device driver level.

Anyways, as long as the memory performance difference is a the levels you
mentioned then there is no "big" issue. Most likely the device driver get
space from the same numa domain that of the socket the HCA is attached to.

Thanks for trying it out
Michael






On Mon, Jul 8, 2013 at 11:45 AM, Brice Goglin  wrote:

>  On a dual E5 2650 machine with FDR cards, I see the IMB Pingpong
> throughput drop from 6000 to 5700MB/s when the memory isn't allocated on
> the right socket (and latency increases from 0.8 to 1.4us). Of course
> that's pingpong only, things will be worse on a memory-overloaded machine.
> But I don't expect things to be "less worse" if you do an intermediate copy
> through the memory near the HCA: you would overload the QPI link as much as
> here, and you would overload the CPU even more because of the additional
> copies.
>
> Brice
>
>
>
> Le 08/07/2013 18:27, Michael Thomadakis a écrit :
>
> People have mentioned that they experience unexpected slow downs in
> PCIe_gen3 I/O when the pages map to a socket different from the one the HCA
> connects to. It is speculated that the inter-socket QPI is not provisioned
> to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may
> not be in effect on all SandyBrige or IvyBridge systems.
>
>  Have you measured anything like this on you systems as well? That would
> require using physical memory mapped to the socket w/o HCA exclusively for
> MPI messaging.
>
>  Mike
>
>
> On Mon, Jul 8, 2013 at 10:52 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> On Jul 8, 2013, at 11:35 AM, Michael Thomadakis 
>> wrote:
>>
>> > The issue is that when you read or write PCIe_gen 3 dat to a non-local
>> NUMA memory, SandyBridge will use the inter-socket QPIs to get this data
>> across to the other socket. I think there is considerable limitation in
>> PCIe I/O traffic data going over the inter-socket QPI. One way to get
>> around this is for reads to buffer all data into memory space local to the
>> same socket and then transfer them by code across to the other socket's
>> physical memory. For writes the same approach can be used with intermediary
>> process copying data.
>>
>>  Sure, you'll cause congestion across the QPI network when you do
>> non-local PCI reads/writes.  That's a given.
>>
>> But I'm not aware of a hardware limitation on PCI-requested traffic
>> across QPI (I could be wrong, of course -- I'm a software guy, not a
>> hardware guy).  A simple test would be to bind an MPI process to a far NUMA
>> node and run a simple MPI bandwidth test and see if to get
>> better/same/worse bandwidth compared to binding an MPI process on a near
>> NUMA socket.
>>
>> But in terms of doing intermediate (pipelined) reads/writes to local NUMA
>> memory before reading/writing to PCI, no, Open MPI does not do this.
>>  Unless there is a PCI-QPI bandwidth constraint that we're unaware of, I'm
>> not sure why you would do this -- it would likely add considerable
>> complexity to the code and it would definitely lead to higher overall MPI
>> latency.
>>
>> Don't forget that the MPI paradigm is for the application to provide the
>> send/receive buffer.  Meaning: MPI doesn't (always) control where the
>> buffer is located (particularly for large messages).
>>
>> > I was wondering if OpenMPI does anything special memory mapping to work
>> around this.
>>
>>  Just what I mentioned in the prior email.
>>
>> > And if with Ivy Bridge (or Haswell) he situation has improved.
>>
>>  Open MPI doesn't treat these chips any different.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> ___
> users mailing 
> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-08 Thread Michael Thomadakis

Thanks Tom, I will test it out...
regards
Michael


On Mon, Jul 8, 2013 at 1:16 PM, Elken, Tom  wrote:

>   ** **
>
> Thanks Tom, that sounds good. I will give it a try as soon as our Phi host
> here host gets installed. 
>
> ** **
>
> I assume that all the prerequisite libs and bins on the Phi side are
> available when we download the Phi s/w stack from Intel's site, right ?***
> *
>
> *[Tom] *
>
> *Right.  When you install Intel’s MPSS (Manycore Platform Software
> Stack), including following the section on “OFED Support” in the readme
> file, you should have all the prerequisite libs and bins.  Note that I have
> not built Open MPI for Xeon Phi for your interconnect, but it seems to me
> that it should work. *
>
> * *
>
> *-Tom*
>
> ** **
>
> Cheers
>
> Michael
>
> ** **
>
> ** **
>
> ** **
>
> On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom  wrote:**
> **
>
> Do you guys have any plan to support Intel Phi in the future? That is,
> running MPI code on the Phi cards or across the multicore and Phi, as Intel
> MPI does?
>
> *[Tom] *
>
> Hi Michael,
>
> Because a Xeon Phi card acts a lot like a Linux host with an x86
> architecture, you can build your own Open MPI libraries to serve this
> purpose.
>
> Our team has used existing (an older 1.4.3 version of) Open MPI source to
> build an Open MPI for running MPI code on Intel Xeon Phi cards over Intel’s
> (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite well.
> We have not released a pre-built Open MPI as part of any Intel software
> release.   But I think if you have a compiler for Xeon Phi (Intel Compiler
> or GCC) and an interconnect for it, you should be able to build an Open MPI
> that works on Xeon Phi.  
>
> Cheers,
> Tom Elken
>
> thanks...
>
> Michael
>
>  
>
> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:***
> *
>
> Rolf will have to answer the question on level of support. The CUDA code
> is not in the 1.6 series as it was developed after that series went
> "stable". It is in the 1.7 series, although the level of support will
> likely be incrementally increasing as that "feature" series continues to
> evolve.
>
>
>
> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis 
> wrote:
>
> > Hello OpenMPI,
> >
> > I am wondering what level of support is there for CUDA and GPUdirect on
> OpenMPI 1.6.5 and 1.7.2.
> >
> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However,
> it seems that with configure v1.6.5 it was ignored.
> >
> > Can you identify GPU memory and send messages from it directly without
> copying to host memory first?
> >
> >
> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ?
> Do you support SDK 5.0 and above?
> >
> > Cheers ...
> > Michael
>
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>  
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ** **
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis

| The driver doesn't allocate much memory here. Maybe some small control
buffers, but nothing significantly involved in large message transfer |
performance. Everything critical here is allocated by user-space (either
MPI lib or application), so we just have to make sure we bind the
| process memory properly. I used hwloc-bind to do that.

I see ... So the user level process (user or MPI library) sets aside memory
(malloc?) and basically then the OFED/IB sets up RDMA messaging with
addresses pointing back to that user physical memory. I guess before
running the MPI benchmark you requested *data *memory allocation policy to
allocate pages "owned" by the other socket?

| Note that we have seen larger issues on older platforms. You basically
just need a big HCA and PCI link on a not-so-big machine. Not very
| common fortunately with todays QPI links between Sandy-Bridge socket,
those are quite big compared to PCI Gen3 8x links to the HCA. On
| old AMD platforms (and modern Intels with big GPUs), issues are not that
uncommon (we've seen up to 40% DMA bandwidth difference
| there).

The issue that has been observed is with PCIe_gen 3 traffic on attached I/O
which, say, reads data off of the HCA and has to store it to memory but
when this memory belongs to the other socket. In that case PCI e data uses
the QPI links on SB to send out these packets to the other socket. It has
been speculated that QPI links where NOT provisioned to transfer more than
1GiB of PCIe data alongside the regular inter-NUMA memory traffic. It may
be the case that Intel has re-provisioned QPI to be able to accommodate
more PCIe traffic.

Thanks again
Michael

On Mon, Jul 8, 2013 at 1:01 PM, Brice Goglin  wrote:

>  The driver doesn't allocate much memory here. Maybe some small control
> buffers, but nothing significantly involved in large message transfer
> performance. Everything critical here is allocated by user-space (either
> MPI lib or application), so we just have to make sure we bind the process
> memory properly. I used hwloc-bind to do that.
>
> Note that we have seen larger issues on older platforms. You basically
> just need a big HCA and PCI link on a not-so-big machine. Not very common
> fortunately with todays QPI links between Sandy-Bridge socket, those are
> quite big compared to PCI Gen3 8x links to the HCA. On old AMD platforms
> (and modern Intels with big GPUs), issues are not that uncommon (we've seen
> up to 40% DMA bandwidth difference there).
>
> Brice
>
>
>
> Le 08/07/2013 19:44, Michael Thomadakis a écrit :
>
>  Hi Brice,
>
>  thanks for testing this out.
>
>  How did you make sure that the pinned pages used by the I/O adapter
> mapped to the "other" socket's memory controller ? Is pining the MPI binary
> to a socket sufficient to pin the space used for MPI I/O as well to that
> socket? I think this is something done by and at the HCA device driver
> level.
>
>  Anyways, as long as the memory performance difference is a the levels
> you mentioned then there is no "big" issue. Most likely the device driver
> get space from the same numa domain that of the socket the HCA is attached
> to.
>
>  Thanks for trying it out
> Michael
>
>
>
>
>
>
>  On Mon, Jul 8, 2013 at 11:45 AM, Brice Goglin wrote:
>
>>  On a dual E5 2650 machine with FDR cards, I see the IMB Pingpong
>> throughput drop from 6000 to 5700MB/s when the memory isn't allocated on
>> the right socket (and latency increases from 0.8 to 1.4us). Of course
>> that's pingpong only, things will be worse on a memory-overloaded machine.
>> But I don't expect things to be "less worse" if you do an intermediate copy
>> through the memory near the HCA: you would overload the QPI link as much as
>> here, and you would overload the CPU even more because of the additional
>> copies.
>>
>> Brice
>>
>>
>>
>> Le 08/07/2013 18:27, Michael Thomadakis a écrit :
>>
>> People have mentioned that they experience unexpected slow downs in
>> PCIe_gen3 I/O when the pages map to a socket different from the one the HCA
>> connects to. It is speculated that the inter-socket QPI is not provisioned
>> to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may
>> not be in effect on all SandyBrige or IvyBridge systems.
>>
>>  Have you measured anything like this on you systems as well? That would
>> require using physical memory mapped to the socket w/o HCA exclusively for
>> MPI messaging.
>>
>>  Mike
>>
>>
>> On Mon, Jul 8, 2013 at 10:52 AM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>>
>>> On Jul 8, 2013, at 11:35 AM, Michael Thomadakis <
>>

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis

| Remember that the point of IB and other operating-system bypass devices
is that the driver is not involved in the fast path of sending /
| receiving.  One of the side-effects of that design point is that
userspace does all the allocation of send / receive buffers.

That's a good point. It was not clear to me who and with what logic was
allocating memory. But definitely for IB it makes sense that the user
provides pointers to their memory.

thanks
Michael

On Mon, Jul 8, 2013 at 1:07 PM, Jeff Squyres (jsquyres)
wrote:

> On Jul 8, 2013, at 2:01 PM, Brice Goglin  wrote:
>
> > The driver doesn't allocate much memory here. Maybe some small control
> buffers, but nothing significantly involved in large message transfer
> performance. Everything critical here is allocated by user-space (either
> MPI lib or application), so we just have to make sure we bind the process
> memory properly. I used hwloc-bind to do that.
>
> +1
>
> Remember that the point of IB and other operating-system bypass devices is
> that the driver is not involved in the fast path of sending / receiving.
>  One of the side-effects of that design point is that userspace does all
> the allocation of send / receive buffers.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-08 Thread Michael Thomadakis

Hi Tim,


Well, in general and not on MIC I usually build the MPI stacks using the
Intel compiler set. Have you ran into s/w that requires GCC instead of
Intel compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to
produce MIC native code (the OpenMPI stack for that matter)?

regards
Michael


On Mon, Jul 8, 2013 at 4:30 PM, Tim Carlson  wrote:

> On Mon, 8 Jul 2013, Elken, Tom wrote:
>
> It isn't quite so easy.
>
> Out of the box, there is no gcc on the Phi card. You can use the cross
> compiler on the host, but you don't get gcc on the Phi by default.
>
> See this post 
> http://software.intel.com/en-**us/forums/topic/382057<http://software.intel.com/en-us/forums/topic/382057>
>
> I really think you would need to build and install gcc on the Phi first.
>
> My first pass at doing a cross-compile with the GNU compilers failed to
> produce something with OFED support (not surprising)
>
> export PATH=/usr/linux-k1om-4.7/bin:$**PATH
> ./configure --build=x86_64-unknown-linux-**gnu --host=x86_64-k1om-linux \
> --disable-mpi-f77
>
> checking if MCA component btl:openib can compile... no
>
>
> Tim
>
>
>
>>
>>
>> Thanks Tom, that sounds good. I will give it a try as soon as our Phi host
>> here host gets installed.
>>
>>
>>
>> I assume that all the prerequisite libs and bins on the Phi side are
>> available when we download the Phi s/w stack from Intel's site, right ?
>>
>> [Tom]
>>
>> Right.  When you install Intel’s MPSS (Manycore Platform Software Stack),
>> including following the section on “OFED Support” in the readme file, you
>> should have all the prerequisite libs and bins.  Note that I have not
>> built
>> Open MPI for Xeon Phi for your interconnect, but it seems to me that it
>> should work.
>>
>>
>>
>> -Tom
>>
>>
>>
>> Cheers
>>
>> Michael
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom  wrote:
>>
>> Do you guys have any plan to support Intel Phi in the future? That is,
>> running MPI code on the Phi cards or across the multicore and Phi, as
>> Intel
>> MPI does?
>>
>> [Tom]
>>
>> Hi Michael,
>>
>> Because a Xeon Phi card acts a lot like a Linux host with an x86
>> architecture, you can build your own Open MPI libraries to serve this
>> purpose.
>>
>> Our team has used existing (an older 1.4.3 version of) Open MPI source to
>> build an Open MPI for running MPI code on Intel Xeon Phi cards over
>> Intel’s
>> (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite
>> well.
>> We have not released a pre-built Open MPI as part of any Intel software
>> release.   But I think if you have a compiler for Xeon Phi (Intel Compiler
>> or GCC) and an interconnect for it, you should be able to build an Open
>> MPI
>> that works on Xeon Phi.
>>
>> Cheers,
>> Tom Elken
>>
>> thanks...
>>
>> Michael
>>
>>
>>
>> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:
>>
>> Rolf will have to answer the question on level of support. The CUDA code
>> is
>> not in the 1.6 series as it was developed after that series went "stable".
>> It is in the 1.7 series, although the level of support will likely be
>> incrementally increasing as that "feature" series continues to evolve.
>>
>>
>>
>> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > >
>> wrote:
>>
>> > Hello OpenMPI,
>> >
>> > I am wondering what level of support is there for CUDA and GPUdirect on
>> OpenMPI 1.6.5 and 1.7.2.
>> >
>> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However,
>> it
>> seems that with configure v1.6.5 it was ignored.
>> >
>> > Can you identify GPU memory and send messages from it directly without
>> copying to host memory first?
>> >
>> >
>> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ?
>> Do
>> you support SDK 5.0 and above?
>> >
>> > Cheers ...
>> > Michael
>>
>> > __**_
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>> __**_
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>
>> __**_
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-09 Thread Michael Thomadakis

Tim, thanks for trying this out ...

Now you should be able to let part of the same OpenMPI application run on
the host multi-core side and the other part on the MIC. IntelMPI can do
this using an MPMD command line where the Xeon binaries run on the host,
whereas the MIC ones on MIC card(s).

I guess you should be able to directly do this from the same OpenMPI mpirun
command line ...

thanks
Michael


On Tue, Jul 9, 2013 at 12:18 PM, Tim Carlson  wrote:

> On Mon, 8 Jul 2013, Tim Carlson wrote:
>
> Now that I have gone through this process, I'll report that it works with
> the caveat that you can't use the openmpi wrappers for compiling. Recall
> that the Phi card does not have either the GNU or Intel compilers
> installed. While you could build up a tool chain for the GNU compilers,
> you're not going to get a native Intel compiler unless Intel decides to
> support it.
>
> Here is the process from end to end to get Openmpi to build a native Phi
> application.
>
> export PATH=/usr/linux-k1om-4.7/bin:$**PATH
> . /share/apps/intel/composer_xe_**2013.3.163/bin/iccvars.sh intel64
> export CC="icc -mmic"
> export CXX="icpc -mmic"
>
> cd ~
> tar zxf openmpi-1.6.4.tar.gz
> cd openmpi-1.6.4
> ./configure --prefix=/people/tim/mic/**openmpi/intel \
> --build=x86_64-unknown-linux-**gnu --host=x86_64-k1om-linux \
> --disable-mpi-f77 \
> AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-**ranlib
> LD=x86_64-k1om-linux-ld
> make
> make install
>
> That leaves me with a native build of openmpi in
> /people/tim/mic/openmpi/intel
>
> It is of course tempting to just do a
> export PATH=/people/tim/mic/openmpi/**intel/bin:$PATH
> and start using mpicc to build my code but that does not work because:
>
> 1) If I try this on the host system I am going to get "wrong architecture"
> because mpicc was build for the Phi and not for the x86_64 host
>
> 2) If I try running it on the Phi, I don't have access to "icc" because I
> can't run the compiler directly on the Phi.
>
> I can "cheat" and see what the mpicc command really does by using "mpicc
> --show" for another installation of openmpi and munge the paths correctly.
> In this case
>
> icc -mmic cpi.c -I/people/tim/mic/openmpi/**intel/include -pthread \
> -L/people/tim/mic/openmpi/**intel/lib -lmpi -ldl -lm -Wl,--export-dynamic
> \
> -lrt -lnsl -lutil -lm -ldl -o cpi.x
>
> That leaves me with a Phi native version of cpi.x which I can then execute
> on the Phi
>
> $ ssh phi002-mic0
>
> ( I have NFS mounts on the Phi for all the bits I need )
>
> ~ $ export PATH=/people/tim/mic/openmpi/**intel/bin/:$PATH
> ~ $ export LD_LIBRARY_PATH=/share/apps/**intel/composer_xe_2013.3.163/**
> compiler/lib/mic/
> ~ $ export LD_LIBRARY_PATH=/people/tim/**mic/openmpi/intel/lib:$LD_**
> LIBRARY_PATH
> ~ $ cd mic
> ~/mic $ mpirun -np 12 cpi.x
> Process 7 on phi002-mic0.local
> Process 10 on phi002-mic0.local
> Process 2 on phi002-mic0.local
> Process 9 on phi002-mic0.local
> Process 1 on phi002-mic0.local
> Process 3 on phi002-mic0.local
> Process 11 on phi002-mic0.local
> Process 5 on phi002-mic0.local
> Process 8 on phi002-mic0.local
> Process 4 on phi002-mic0.local
> Process 6 on phi002-mic0.local
> Process 0 on phi002-mic0.local
> pi is approximately 3.1416009869231245, Error is 0.0814
> wall clock time = 0.001766
>
>
>
>  On Mon, 8 Jul 2013, Elken, Tom wrote:
>>
>> My mistake on the OFED bits. The host I was installing on did not have
>> all of the MPSS software installed (my cluster admin node and not one of
>> the compute nodes). Adding the intel-mic-ofed-card RPM fixed the problem
>> with compiling the btl:openib bits with both the GNU and Intel compilers
>> using the cross-compiler route (-mmic on the Intel side)
>>
>> Still working on getting the resulting mpicc wrapper working on the MIC
>> side. When I get a working example I'll post the results.
>>
>> Thanks!
>>
>> Tim
>>
>>
>>
>>>
>>>
>>>  Hi Tim,
>>>
>>>
>>>
>>>
>>>
>>>  Well, in general and not on MIC I usually build the MPI stacks using the
>>>  Intel compiler set. Have you ran into s/w that requires GCC instead of
>>>  Intel
>>>  compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to
>>>  produce
>>>  MIC native code (the OpenMPI stack for that matter)?
>>>
>>>  [Tom]
>>>
>>>  Good idea Michael,  With the Intel Compiler, I would use the -mmic flag
>>> to
>>>  build MIC code.
>>>
>>>
>

[OMPI users] Planned support for Intel Phis

2014-02-02 Thread Michael Thomadakis

Hello OpenMPI,

I was wondering what is the support that is being implemented for the Intel
Phi platforms. That is would we be able to run MPI code in "symmetric"
fashion, where some ranks run on the cores of the multicore hostst and some
on the cores of the Phis in a multinode cluster environment.

Also is it based on OFED 1.5.4.1 or on which OFED?

Best regards
Michael

Re: [OMPI users] openmpi fails on mx endpoint busy

2007-07-05 Thread Michael Edwards


If the machine is multi-processor you might want to add the sm btl.  That
cleared up some similar problems for me, though I don't use mx so your
millage may vary.

On 7/5/07, SLIM H.A.  wrote:



Hello

I have compiled openmpi-1.2.3 with the --with-mx=
configuration and gcc compiler. On testing with 4-8 slots I get an error
message, the mx ports are busy:

>mpirun --mca btl mx,self -np 4 ./cpi
[node001:10071] mca_btl_mx_init: mx_open_endpoint() failed with
status=20
[node001:10074] mca_btl_mx_init: mx_open_endpoint() failed with
status=20
[node001:10073] mca_btl_mx_init: mx_open_endpoint() failed with
status=20

--
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
... snipped
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)

--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 10071 on node node001 exited on
signal 1 (Hangup).


I would not expect mx messages as communication should not go through
the mx card? (This is a twin dual core  shared memory node)
The same happens when testing on 2 nodes, using a hostfile.
I checked the state of the mx card with mx_endpoint_info and mx_info,
they are healthy and free.
What is missing here?

Thanks

Henk

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] mpirun hanging followup

2007-07-17 Thread Michael Edwards


If you are having difficulty getting openmpi set up yourself, you
might look into OSCAR or Rocks, they make setting up your cluster much
easier and include various mpi packages as well as other utilities for
reducing your management overhead.

I can help you (off list) get set up with OSCAR if you like, and there
are very helpful mailing lists for both projects.

On 7/17/07, Bill Johnstone  wrote:

Hello all.

I could really use help trying to figure out why mpirun is hanging as
detailed in my previous message yesterday, 16 July.  Since there's been
no response, please allow me to give a short summary.

-Open MPI 1.2.3 on GNU/Linux, 2.6.21 kernel, gcc 4.1.2, bash 3.2.15 is
default shell
-Open MPI installed to /usr/local, which is in non-interactive session
path
-Systems are AMD64, using ethernet as interconnect, on private IP
network

mpirun hangs whenever I invoke any process running on a remote node.
It runs a job fine if I invoke it so that it only runs on the local
node.  Ctrl+C never successfully cancels an mpirun job -- I have to use
kill -9.

I'm asking for help trying to figure what steps have been taken by
mpirun, and how I can figure out where things are getting stuck /
crashing.  What could be happening on the remote nodes?  What debugging
steps can I take?

Without MPI running, the cluster is of no use, so I would really
appreciate some help here.





Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] mpirun hanging followup

2007-07-17 Thread Michael Edwards

On 7/17/07, Bill Johnstone  wrote:

Thanks for the help.  I've replied below.

--- "G.O."  wrote:

> 1- Check to make sure that there are no firewalls blocking
> traffic between the nodes.

There is no firewall in-between the nodes.  If I run jobs directly via
ssh, e.g. "ssh node4 env" they work.

Are you using host based authentication of some kind?  ie, are you
being prompted for a password when you ssh between nodes?

[OMPI users] OpenMPI and PathScale problem

2007-08-07 Thread Michael Komm

I'm trying to make work the pathscale fortran compiler with OpenMPI on a 64bit 
Linux machine and can't get passed a simple demo program. Here is detailed info:

pathf90 -v
PathScale EKOPath(TM) Compiler Suite: Version 2.5
Built on: 2006-08-22 21:02:51 -0700
Thread model: posix
GNU gcc version 3.3.1 (PathScale 2.5 driver)

mpif90 --show
pathf90 -I/home/fort/usr//include -pthread -I/home/fort/usr//lib 
-L/home/fort/usr//lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl 
-Wl,--export-dynamic -lnsl -lutil -lm -ldl

The OpenMPI version 1.2.3 resides in the /home/fort/usr/ directory.

When I compile a simple program using 

mpif90 -o test test.f90

I get a binary all right but it has broken linked libraries

ldd test
libmpi_f90.so.0 => not found
libmpi_f77.so.0 => not found
libmpi.so.0 => /usr/lib64/lam/libmpi.so.0 (0x003db360)
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib64/libdl.so.2 (0x003db320)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003db990)
libutil.so.1 => /lib64/libutil.so.1 (0x003db840)
libmv.so.1 => /opt/pathscale/lib/2.5/libmv.so.1 (0x002a9557f000)
libmpath.so.1 => /opt/pathscale/lib/2.5/libmpath.so.1 
(0x002a956a8000)
libm.so.6 => /lib64/tls/libm.so.6 (0x003db300)
libpathfortran.so.1 => /opt/pathscale/lib/2.5/libpathfortran.so.1 
(0x002a957c9000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x003db380)
libc.so.6 => /lib64/tls/libc.so.6 (0x003db2d0)
/lib64/ld-linux-x86-64.so.2 (0x003db290)

The demo program fails to start due to missing shared libraries. In addition 
the pathf90 uses some lame mpi library instead of openMPI! Any ideas on where 
the problem could be?

 Michael

********
Mgr. Michael Komm
Tokamak Department
Institute of Plasma Physics of Academy of Sciences of Czech Republic
E-mail:k...@ipp.cas.cz
Za Slovankou 3 
182 00 
PRAGUE 8

Re: [OMPI users] OpenMPI and PathScale problem

2007-08-07 Thread Michael Komm

Thanks Christian it works just fine now!
I altered LIBRARY_PATH and LD_PATH but not this one :)
Michael


__
> Od: christian.bec...@math.uni-dortmund.de
> Komu: Open MPI Users 
> Datum: 07.08.2007 19:32
> Předmět: Re: [OMPI users] OpenMPI and PathScale problem
>
>Hi Michael,
>
>you have to add the path to the openmpi libraries to the LD_LIBRARY_PATH
variable
>
>export LD_LIBRARY_PATH=/home/fort/usr//lib
>
>should fix the problem.
>
>Bye,
>Christian
>
>Michael Komm wrote:
>> I'm trying to make work the pathscale fortran compiler with OpenMPI on
a 64bit Linux machine and can't get passed a simple demo program. Here is
detailed info:
>> 
>> pathf90 -v
>> PathScale EKOPath(TM) Compiler Suite: Version 2.5
>> Built on: 2006-08-22 21:02:51 -0700
>> Thread model: posix
>> GNU gcc version 3.3.1 (PathScale 2.5 driver)
>> 
>> mpif90 --show
>> pathf90 -I/home/fort/usr//include -pthread -I/home/fort/usr//lib
-L/home/fort/usr//lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
-Wl,--export-dynamic -lnsl -lutil -lm -ldl
>> 
>> The OpenMPI version 1.2.3 resides in the /home/fort/usr/ directory.
>> 
>> When I compile a simple program using 
>> 
>> mpif90 -o test test.f90
>> 
>> I get a binary all right but it has broken linked libraries
>> 
>> ldd test
>> libmpi_f90.so.0 => not found
>> libmpi_f77.so.0 => not found
>> libmpi.so.0 => /usr/lib64/lam/libmpi.so.0 (0x003db360)
>> libopen-rte.so.0 => not found
>> libopen-pal.so.0 => not found
>> libdl.so.2 => /lib64/libdl.so.2 (0x003db320)
>> libnsl.so.1 => /lib64/libnsl.so.1 (0x003db990)
>> libutil.so.1 => /lib64/libutil.so.1 (0x003db840)
>> libmv.so.1 => /opt/pathscale/lib/2.5/libmv.so.1
(0x002a9557f000)
>> libmpath.so.1 => /opt/pathscale/lib/2.5/libmpath.so.1
(0x002a956a8000)
>> libm.so.6 => /lib64/tls/libm.so.6 (0x003db300)
>> libpathfortran.so.1 =>
/opt/pathscale/lib/2.5/libpathfortran.so.1 (0x002a957c9000)
>> libpthread.so.0 => /lib64/tls/libpthread.so.0
(0x003db380)
>> libc.so.6 => /lib64/tls/libc.so.6 (0x003db2d0)
>> /lib64/ld-linux-x86-64.so.2 (0x003db290)
>> 
>> The demo program fails to start due to missing shared libraries. In
addition the pathf90 uses some lame mpi library instead of openMPI! Any
ideas on where the problem could be?
>> 
>>  Michael
>> 
>> 
>> Mgr. Michael Komm
>> Tokamak Department
>> Institute of Plasma Physics of Academy of Sciences of Czech Republic
>> E-mail:k...@ipp.cas.cz
>> Za Slovankou 3 
>> 182 00 
>> PRAGUE 8 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>

[OMPI users] sed: 33: ...: unescaped newline inside substitute pattern

2007-09-25 Thread Michael Clover

I have been attempting to compile open-mpi, both 1.2.2 and 1.2.3 on a  
new iMac (core 2 duo, 2.4 GHz, OS X 10.4.10), using gfortran as my  
fortran compiler, and a very recent Xtools (ld -v gives version  
cctools-622.5.obj~13).  I have tried both the full line,
configure --prefix=/usr/local/openmpi --disable-mpi-cxx --disable-mpi- 
f90 --without-xgrid FC=gfortran

as well as a truncated line,
configure --prefix=/usr/local/openmpi
and switched compilers via
setenv FC g95
configure --prefix=/usr/local/openmpi --disable-mpi-cxx --disable-mpi- 
f90 --without-xgrid


and in all cases, after minutes of working away, get to the point  
that someone else got to last year (when it tries to create the  
Makefiles, etc) and get the following output (approximately 200 pairs  
of sed:33 and sed:4's).  This has been happening for over a week,  
with reboots every night.  I attach the configure terminal output as  
well as the log file (for a 1.2.2 attempt).


ompi-output.tar.gz
Description: GNU Zip compressed data


...checking for OMPI LIBS...  -lSystemStubs
checking for OMPI extra include dirs...  openmpi

*** Final output
configure: creating ./config.status
config.status: creating ompi/include/ompi/version.h
sed: 33: ./confstatkVPvQm/subs-3.sed: unescaped newline inside  
substitute pattern
sed: 4: ./confstatkVPvQm/subs-4.sed: unescaped newline inside  
substitute pattern

config.status: creating orte/include/orte/version.h
sed: 33: ./confstatkVPvQm/subs-3.sed: unescaped newline inside  
substitute pattern




Michael Clover
mclo...@san.rr.com

[OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)

2007-10-06 Thread Michael Clover

I was just trying to install openmpi-1.2.4 on a brand new iMac (2.4  
GHZ Intel Core 2 Duo, 1GB RAM, OSX 10.4.10), having just loaded the  
xtools environnment.  I am able to successfully run the configure,  
but make dies instantly:


configure -prefix=/usr/local/openmpi --disable-mpi-cxx --disable-mpi- 
f90 --without-xgrid FC=gfortran | tee config.out

...
config.status: executing depfiles commands
config.status: executing libtool commands
cloverm:~/bin/openmpi-1.2.4:[22]>make -j 4 |& tee make.out
Makefile:602: *** missing separator.  Stop.
cloverm:~/bin/openmpi-1.2.4:[23]>ls *.out
config.out  make.out
cloverm:~/bin/openmpi-1.2.4:[24]>tar -zcvf ompi-output.tar.gz *.log  
*.out

config.log
config.out
make.out
cloverm:~/bin/openmpi-1.2.4:[25]>ld -v
Apple Computer, Inc. version cctools-622.5.obj~13

I have copied lines 599-609 from Makefile, so you can see that Make  
is trying to run gcc, in a way that doesn't look correct


OMPI_AS_GLOBAL =
OMPI_AS_LABEL_SUFFIX =
OMPI_CC_ABSOLUTE =  DISPLAY known
/usr/bin/gcc
OMPI_CONFIGURE_DATE = Sat Oct  6 16:05:59 PDT 2007
OMPI_CONFIGURE_HOST = michael-clovers-computer.local
OMPI_CONFIGURE_USER = mrc
OMPI_CXX_ABSOLUTE =  DISPLAY known
/usr/bin/g++
OMPI_F77_ABSOLUTE = none
OMPI_F90_ABSOLUTE = none


I am also attaching the tee'd results, the config.log, and the  
Makefile that doesn't work:


cloverm:~/bin/openmpi-1.2.4:[27]>tar -zcvf ompi-output.tar.gz *.log  
*.out Makefile

config.log
config.out
make.out
Makefile


ompi-output.tar.gz
Description: GNU Zip compressed data


Michael Clover
mclo...@san.rr.com

[OMPI users] sed and openmpi on Mac OSX 10.4.10

2007-10-07 Thread Michael Clover


Jeff,

I tried to look  at the checksum of my version of sed, and got a  
different number.  I also found instructions on an Octave web page  
about loading the GNU sed on a Mac, to replace the POSIX flavored one  
that comes with it.  I was able to load sed-4.1.4, but still don't  
get your checksums (I changed the name of the original Mac sed to  
__sed).  Are you using the Mac supplied sed or not?


cloverm:~/bin/openmpi-1.2.3:[132]>md5 /usr/local/bin/sed
MD5 (/usr/local/bin/sed) = 01f9ed14ed1fa9fcf7406dd8a609
cloverm:~/bin/openmpi-1.2.3:[133]>md5 /usr/bin/__sed
MD5 (/usr/bin/__sed) = e8e106779d71f6f2cca9c7157ce4b5eb

However, this new sed made only a slight difference on  
openmpi-1.2.3:  instead of getting unescaped newlines, I now get  
unterminated "s" commands:
(and with openmpi-1.2.4, I still get the same problem reported  
yesterday when I try to "make" the successfully configured 1.2.4,  
namely that line 602 of Makefile is missing a separator).


checking for OMPI LIBS...
checking for OMPI extra include dirs...

*** Final output
configure: creating ./config.status
config.status: creating ompi/include/ompi/version.h
sed: file ./confstatA1BhUF/subs-3.sed line 33: unterminated `s' command
sed: file ./confstatA1BhUF/subs-4.sed line 4: unterminated `s' command
config.status: creating orte/include/orte/version.h



Michael Clover
mclo...@san.rr.com

[OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)

2007-10-08 Thread Michael Clover


Reuti,

My gcc is also in /usr/bin as gcc:

cloverm:~:[5]>which gcc
/usr/bin/gcc
cloverm:~:[6]>gcc -v
Using built-in specs.
Target: i686-apple-darwin8
Configured with: /private/var/tmp/gcc/gcc-5367.obj~1/src/configure -- 
disable-checking -enable-werror --prefix=/usr --mandir=/share/man -- 
enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg] 
[^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with- 
slibdir=/usr/lib --build=powerpc-apple-darwin8 --with-arch=nocona -- 
with-tune=generic --program-prefix= --host=i686-apple-darwin8 -- 
target=i686-apple-darwin8

Thread model: posix
gcc version 4.0.1 (Apple Computer, Inc. build 5367)

I thought the " DISPLAY known" might have been some result of  
my .tcshrc file, so I started up sh in a terminal window before  
running configure and make, but I still get the same error


Michael Clover
mclo...@san.rr.com



On Oct 8, 2007, at 9:00 , users-requ...@open-mpi.org wrote:



Message: 2
Date: Mon, 8 Oct 2007 17:19:57 +0200
From: Reuti 
Subject: Re: [OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)
To: Open MPI Users 
Message-ID:
<897b6169-808d-4581-a19b-4cb8da2e3...@staff.uni-marburg.de>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Am 07.10.2007 um 01:16 schrieb Michael Clover:


I was just trying to install openmpi-1.2.4 on a brand new iMac (2.4
GHZ Intel Core 2 Duo, 1GB RAM, OSX 10.4.10), having just loaded the
xtools environnment.  I am able to successfully run the configure,
but make dies instantly:

configure -prefix=/usr/local/openmpi --disable-mpi-cxx --disable-
mpi-f90 --without-xgrid FC=gfortran | tee config.out
...
config.status: executing depfiles commands
config.status: executing libtool commands
cloverm:~/bin/openmpi-1.2.4:[22]>make -j 4 |& tee make.out
Makefile:602: *** missing separator.  Stop.
cloverm:~/bin/openmpi-1.2.4:[23]>ls *.out
config.out  make.out
cloverm:~/bin/openmpi-1.2.4:[24]>tar -zcvf ompi-output.tar.gz *.log
*.out
config.log
config.out
make.out
cloverm:~/bin/openmpi-1.2.4:[25]>ld -v
Apple Computer, Inc. version cctools-622.5.obj~13

I have copied lines 599-609 from Makefile, so you can see that Make
is trying to run gcc, in a way that doesn't look correct

OMPI_AS_GLOBAL =
OMPI_AS_LABEL_SUFFIX =
OMPI_CC_ABSOLUTE =  DISPLAY known
/usr/bin/gcc


The "DISPLAY known" shouldn't be there. What is a plain:

which gcc

giving? Just /usr/bin/gcc as for me or something more?

-- Reuti



OMPI_CONFIGURE_DATE = Sat Oct  6 16:05:59 PDT 2007
OMPI_CONFIGURE_HOST = michael-clovers-computer.local
OMPI_CONFIGURE_USER = mrc
OMPI_CXX_ABSOLUTE =  DISPLAY known
/usr/bin/g++
OMPI_F77_ABSOLUTE = none
OMPI_F90_ABSOLUTE = none

[OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)

2007-10-09 Thread Michael Clover


Jeff,
 as it turned out, my .tcshrc file did output "DISPLAY known"... I  
had logic to set DISPLAY if it was undefined:


if ( ! $?DISPLAY ) then
   if ( ! $?SSH_CLIENT ) then
  if ( "$OS" == "darwin") then
# ... irrelevant
  else
 echo "no environment variable to capture your IP from"
 set w_data = $user
 setenv DISPLAY ${w_data}:0.0
  endif
   else
  set whom = `echo $SSH_CLIENT `
  if ( $?whom ) then
 set i_am =  `echo $whom[1] | sed -e "s/:::128/128/"`
 setenv DISPLAY ${i_am}:0.0
  echo " DISPLAY set from SSH_CLIENT"
  endif
   endif
else
   echo " DISPLAY known"
endif

I have commented out the echo of "display known", and now  
openmpi-1.2.4  makes just fine.  Furthermore, openmpi-1.2.3 no longer  
generates the unterminated newlines for sed either, and also makes  
correctly.  I must have mistyped something when I grep'ed for  
"display" or "known" before my reply to Reuti, since I didn't find it  
until your question.  thanks for all the help.


Michael Clover
mclo...@san.rr.com



On Oct 8, 2007, at 11:09 , users-requ...@open-mpi.org wrote:



On Oct 8, 2007, at 9:00 , users-requ...@open-mpi.org wrote:


--

Message: 5
Date: Tue, 9 Oct 2007 08:08:23 +0200
From: Jeff Squyres 
Subject: Re: [OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)
To: Open MPI Users 
Message-ID: <897eb321-9a89-4d8b-8b19-d53225573...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

 From the files you attached, I see the following in config.log:

OMPI_CC_ABSOLUTE=' DISPLAY known

and several lines later:

OMPI_CXX_ABSOLUTE=' DISPLAY known

But in Makefile, I see this bogus 2-line value (same as you noted):

OMPI_CC_ABSOLUTE =  DISPLAY known
/usr/bin/gcc

and several lines later:

OMPI_CXX_ABSOLUTE =  DISPLAY known
/usr/bin/g++

Note that we set these two values in configure with the following
commands:

OMPI_CC_ABSOLUTE="`which $CC`"
OMPI_CXX_ABSOLUTE="`which $CXX`"

So I *suspect* that the bogus values in config.status are totally
hosing you when trying to create all the other files -- the version
of "sed" is a red herring.

What exactly is your output when you run "which gcc" and "which g+
+"?  We are blindly taking the whole value -- mainly because I've
never seen "which foo" give more than one line on stdout.  ;-)

What *could* be happening is that your shell startup files are
generating some output (e.g., "DISPLAY known") and that's being
output before "which foo" is run because of the `` usage.  Do your
shell startup files emit "DISPLAY known" when you start up?



--

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 713, Issue 1
*

[OMPI users] mpicc Segmentation Fault with Intel Compiler

2007-11-06 Thread Michael Schulz


Hi,

I've the same problem described by some other users, that I can't
compile anything if I'm using the open-mpi compiled with the Intel- 
Compiler.


> ompi_info --all
Segmentation fault

OpenSUSE 10.3
Kernel: 2.6.22.9-0.4-default
Intel P4

Configure-Flags: CC=icc, CXX=icpc, F77=ifort, F90=ifort

Intel-Compiler: both, C and Fortran 10.0.025

Is there any known solution?
Thanks,

Michael

Re: [OMPI users] mpicc Segmentation Fault with Intel Compiler

2007-11-07 Thread Michael Schulz


On 06.11.2007, at 10:42, Åke Sandgren wrote:

Hi,


On Tue, 2007-11-06 at 10:28 +0100, Michael Schulz wrote:

Hi,

I've the same problem described by some other users, that I can't
compile anything if I'm using the open-mpi compiled with the Intel-
Compiler.


ompi_info --all

Segmentation fault

OpenSUSE 10.3
Kernel: 2.6.22.9-0.4-default
Intel P4

Configure-Flags: CC=icc, CXX=icpc, F77=ifort, F90=ifort

Intel-Compiler: both, C and Fortran 10.0.025

Is there any known solution?


I had the same problem with pathscale.
Try this, i think it is the solution i found.

diff -ru site/opal/runtime/opal_init.c
amd64_ubuntu606-psc/opal/runtime/opal_init.c
--- site/opal/runtime/opal_init.c   2007-10-20 03:00:35.0
+0200
+++ amd64_ubuntu606-psc/opal/runtime/opal_init.c2007-10-23
16:12:15.0 +0200
@@ -169,7 +169,7 @@
 }

 /* register params for opal */
-if (OPAL_SUCCESS !=  opal_register_params()) {
+if (OPAL_SUCCESS !=  (ret = opal_register_params())) {
 error = "opal_register_params";
 goto return_error;
 }


thanks, but this doesn't solve my segv Problem.

Michael

[OMPI users] CfP 3rd Workshop on Virtualization in HPC Cluster and Grid Computing Environments (VHPC'08)

2008-01-23 Thread Michael Alexander

Apologies if you received multiple copies of this message.


===
CALL FOR PAPERS

3rd Workshop on Virtualization in High-Performance Cluster
and Grid Computing (VHPC'08)

as part of Euro-Par 2008, Las Palmas de Gran Canaria, Canary Island,
Spain

===


Date: August 26-29, 2008

Euro-Par 2008: http://europar2008.caos.uab.es/
Workshop URL: http://xhpc.wu-wien.ac.at/


SUBMISSION DEADLINE:
Abstracts: February 4, 2008
Full Paper: April 14, 2008


Scope:
Virtual machine monitors (VMMs) are becoming tightly integrated with
standard OS distributions, leading to increased adoption in many
application areas including scientific educational and high-performance
computing (HPC). VMMs allow for the concurrent execution of potentially
large numbers of virtual machines, providing encapsulation, isolation,
and the possibility for migrating VMs between physical hosts. These
features enable physical clusters to be treated as "computation pools",
where a variety of execution environments can be dynamically
instantiated on the underlying hardware. VM technology is therefore
opening up new architectures and services for HPC in cluster and grid
environments, but consensus has not yet emerged on the best models
and tools. This workshop aims to bring together researchers and
practitioners working on virtualization in HPC environments, with the
goal of sharing experience and promoting the development of a
research community in this emerging area.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.
The workshop will also include a 30 min panel discussion by presenters.




TOPICS

Topics include, but are not limited to, the following subject matters:

- Virtualization in cluster and grid environments
- Workload characterizations for VM-based clusters
- VM cluster and grid architectures
- Cluster reliability, fault-tolerance, and security
- Compute job entry and scheduling
- Compute workload load leveling
- Cluster and grid filesystems for VMs
- VMMs, VMs and QoS guarantees
- Research and education use cases
- VM cluster distribution algorithms
- MPI, PVM on virtual machines
- System sizing
- Hardware support for virtualization
- High-speed interconnects in hypervisors
- Hypervisor extensions and utilities for cluster and grid computing
- Network architectures for VM-based clusters
- VMMs/Hypervisors on large SMP machines
- Performance models
- Performance management and tuning hosts and guest VMs
- Power considerations
- VMM performance tuning on various load types
- Xen/other VMM cluster/grid tools
- High-speed Device access from VMs
- Management, deployment of clusters and grid environments with VMs
- Information systems for virtualized clusters
- Management of system images for virtual machines
- Integration with relevant standards e.g. CIM, GLUE, OGF, etc.


PAPER SUBMISSION

Papers submitted to each workshop will be reviewed by at least two
members of the program committee and external reviewers. Submissions
should include abstract, key words, the e-mail address of the
corresponding author, and must not exceed 10 pages, including tables
and figures at a main font size no smaller than 11 point. Submission
of a paper should be regarded as a commitment that, should the paper be
accepted, at least one of the authors will register and attend the
conference to present the work.

Accepted papers will be published in the Springer LNCS series - the
format must be according to the Springer LNCS Style. Initial
submissions are in PDF, accepted papers will be requested to
provided source files.

http://www.springer.de/comp/lncs/authors.html


Submission Link:
http://www.edas.info/newPaper.php?c=6123&;



IMPORTANT DATES


February 4, 2008 - Abstract submissions due
Full paper submission due: April 14, 2008
Acceptance notification: May 3, 2008
Camera-ready due: May 26, 2008
Conference: August 26-29, 2008


CHAIR


Michael Alexander (chair), WU Vienna, Austria
Stephen Childs (co-chair), Trinity College, Dublin, Ireland


PROGRAM COMMITTEE


Jussara Almeida, Federal University of Minas Gerais, Brasil
Padmashree Apparao, Intel Corp., US
Hassan Barada, Etisalat University College, UAE
Volker Buege, University of Karlsruhe, Germany
Simon Crosby, Xensource, UK
Marcus Hardt, Forschungszentrum Karlsruhe, Germany
Sverre Jarp, CERN, Switzerland
Krishna Kant, Intel Corporation, US
Yves Kemp, University of Karlsruhe, Germany
Naoya Maruyama, Tokyo Institute of Technology, Japan
Jean-Marc Menaud, Ecole des Mines de Nantes, France
José E. Moreira, IBM Watson Research Center, US
Yoshio Turner, HP Labs
Andreas Unterkircher, CERN, Switzerland
Dongyan Xu, Purdue University, US


GENERAL INFORMATION


The workshop will be held as part of Euro-Par 2008, Las Palmas de
Gran Canaria, C

Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-18 Thread Michael Jennings

On Tuesday, 18 March 2008, at 12:15:34 (-0700),
Christopher Irving wrote:

> Now, if you removed line 651 and 653 from the new spec file it works
> for both cases.  You wont get the files listed twice error because
> although you have the statement %dir %{_prefix} on line 649 you
> never have a line with just %{_prefix}.  So the _prefix directory
> itself gets included but not all files underneath it.  You've
> handled that by explicitly including all files and sub directories
> on lines 672-681 and in the runtime.file.

The only package which should own %{_prefix} is something like setup
or filesystem in the core OS package set.  No openmpi RPM should ever
own %{_prefix}, so it should never appear in %files, either by itself
or with %dir.

> Going back to the original spec file, the one that came with the
> source RPM, the problems where kind of reversed.  Building with the
> 'install_in_opt 1' option worked just fine but when it wasn't set
> you got the files listed twice error as I described in my original
> message.

"files listed twice" messages are not errors, per se, and can usually
be safely ignored.  Those who are truly bothered by them can always
add %exclude directives if they so choose.

Michael

-- 
Michael Jennings 
Linux Systems and Cluster Admin
UNIX and Cluster Computing Group

Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-19 Thread Michael Jennings

On Tuesday, 18 March 2008, at 18:18:36 (-0700),
Christopher Irving wrote:

> Well you're half correct.  You're thinking that _prefix is always
> defined as /usr.

No, actually I'm not. :)

> But in the case were install_in_opt is defined they have redefined
> _prefix to be /opt/%{name}/%{version} in which case it is fine for
> one of the openmpi rpms to claim that directory with a %dir
> directive.

Except that you should never do that.  First off, RPMs should never
install in /opt by default.  Secondly, the correct way to support
installing in /opt is to list the necessary prefixes in the RPM
headers so that the --prefix option (or the --relocate option) may be
used at install time.  OpenMPI already has hooks (IIRC) for figuring
things out intelligently based on invocation prefix, so it should fit
quite nicely into this model.

Obviously RPMs only intended for local use can do anything they want,
but RPMs which install in /opt should never be redistributed.

> However I think you missed the point.  I'm not suggesting they need
> to a %{_prefix} statement in the %files section, I'm just pointing
> out what's not the source of the duplicated files. In other words
> %dir %{_prefix} is not the same as %{_prefix} and wont cause all the
> files in _prefix to be included.

That's correct.

> It can't be safely ignored when it causes rpm build to fail.

The warning by itself should never cause rpmbuild to fail.  If it
does, the problem lies elsewhere.  Nothing in either the rpm 4.4 nor 5
code can cause failure at that point.

> Also you don't want to use an %exclude because that would prevent
> the specified files from ever getting included which is not the
> desired result.

If you use %exclude in only one of the locations where the file is
listed (presumably the "less correct" one), it will solve the problem.

Michael

-- 
Michael Jennings 
Linux Systems and Cluster Admin
UNIX and Cluster Computing Group

Re: [OMPI users] cluster LiveCD

2008-08-07 Thread Michael Jennings

On Thursday, 07 August 2008, at 15:03:24 (-0400),
Tim Mattox wrote:

> I think a better approach than using NFS-root or LiveCDs is to use Perceus in
> this situation, since it has been developed over many years to handle this
> sort of thing (diskless/stateless beowulf clusters):
>   http://www.perceus.org/
> It leverages PXE booting so all you need to do on a per-node basis is enable
> PXE booting in the BIOS.  The primary limitation I see would be if your
> windows machines are set up to use DHCP to get their IP addresses from
> some server that is outside your control, since Perceus would need to take
> over DHCP services to do its magic.

At the risk of being slightly off-topic, Perceus actually has no
problem working with a separate DHCP server.  It has to be properly
configured to hand out the payload, of course, but it works fine.

Michael

-- 
Michael Jennings 
Linux Systems and Cluster Admin
UNIX and Cluster Computing Group
Bldg 50B-3209E   W: 510-495-2687
MS 050C-3396 F: 510-486-8615

[OMPI users] build problems - undefined reference to `lt_libltdlc_LTX_preloaded_symbols and libtool install

2009-08-27 Thread Michael Hines

I'm building from todays svn co on an x86_64 centos 5
Linux 2.6.18-128.1.10.el5 #1 SMP
using

m4 (GNU M4) 1.4.13
automake (GNU automake) 1.11
autoconf (GNU Autoconf) 2.64
ltmain.sh (GNU libtool) 2.2.6
gcc (GCC) 4.3.2

and configured with

../configure --prefix=$HOME/openmpi --srcdir=.. --disable-mpi-f77
--disable-mpi-f90

and get

libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing
-pthread -fvisibility=hidden -o opal_wrapper
opal_wrapper.o  ../../../opal/.libs/libopen-pal.a -ldl -lnsl -lutil -lm
-pthread
../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o): In function
`lt_dlinit':
ltdl.c:(.text+0x10d3): undefined reference to
`lt_libltdlc_LTX_preloaded_symbols'

Is anyone familiar with this or what to do about it?
If I try to avoid with

../configure --prefix=$HOME/openmpi --srcdir=.. --disable-mpi-f77
--disable-mpi-f90 --disable-dlopen

I 'make -j 4' successfully but during 'make install' get

  /bin/sh ../../../libtool   --mode=install /usr/bin/install -c
opal_wrapper '/home/hines/openmpi/bin'
./opal_wrapper: line 1: ELF: command not found
libtool: install: invalid libtool wrapper script `opal_wrapper'

Hints on how to build on this machine are greatly welcome. I had the
same problems when using openmpi-1.3.3.tar.gz and my normal development
environment (less recent m4 and autotools, and gcc-4.1.2)

Thanks,
Michael

Re: [OMPI users] build problems - undefined referenceto `lt_libltdlc_LTX_preloaded_symbols and libtool install

2009-09-02 Thread Michael Hines

Hello.
Thanks!
On Wed, 2009-09-02 at 10:51 +0300, Jeff Squyres wrote:
> On Aug 27, 2009, at 8:34 PM, Michael Hines wrote:
...
> > ltdl.c:(.text+0x10d3): undefined reference to
> > `lt_libltdlc_LTX_preloaded_symbols'
> >
> 
> Hmm.  This feels like a mismatch of libtool somehow...  (ltdl is a  
> part of the larger Libtool package).  Can you send all the information  
> listed here:
> 
Enclosed. I started from openmpi-1.3.3.tar.gz and
tar xzf ~/Desktop/openmpi-1.3.3.tar.gz
cd openmpi-1.3.3
./configure --prefix=$HOME/openmpi --disable-mpi-f77 --disable-mpi-f90
&> config.out
make all &> make.out
tar czf ompibld.tgz config.out config.log make.out

...
> >   /bin/sh ../../../libtool   --mode=install /usr/bin/install -c
> > opal_wrapper '/home/hines/openmpi/bin'
> > ./opal_wrapper: line 1: ELF: command not found
> > libtool: install: invalid libtool wrapper script `opal_wrapper'

> This seems like an even bigger problem -- ELF is not a command, so how  
> it's trying to execute that seems pretty nebulous.
I guess opal_wrapper is supposed to be a script that contains the
executable but it turned out just to be the executable. If the earlier
issue is resolved (presumably libtool related) this may go away
as well.

-Michael


ompibld.tgz
Description: application/compressed-tar

Re: [OMPI users] build problems - undefined referenceto `lt_libltdlc_LTX_preloaded_symbols and libtool install

2009-09-02 Thread Michael Hines


On Wed, 2009-09-02 at 10:51 +0300, Jeff Squyres wrote:
> On Aug 27, 2009, at 8:34 PM, Michael Hines wrote:

> > libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict- 
> > aliasing
> > -pthread -fvisibility=hidden -o opal_wrapper
> > opal_wrapper.o  ../../../opal/.libs/libopen-pal.a -ldl -lnsl -lutil - 
> > lm
> > -pthread
> > ../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o): In function
> > `lt_dlinit':
> > ltdl.c:(.text+0x10d3): undefined reference to
> > `lt_libltdlc_LTX_preloaded_symbols'
> >
> 
> Hmm.  This feels like a mismatch of libtool somehow...  (ltdl is a  
> part of the larger Libtool package).

I should mention that when I
[hines@hines490 openmpi-1.3.3]$ for i in `find . -name \*.o -print` ; do
echo $i ; nm $i |grep preloaded ; done

the only files that have preloaded in them are

./opal/libltdl/libltdlc_la-preopen.o
0008 b default_preloaded_symbols
 b preloaded_symlists
./opal/libltdl/libltdlc_la-ltdl.o
 U lt_libltdlc_LTX_preloaded_symbols
and I can't find anywhere that lt_libltdlc_LTX_preloaded_symbols
is defined

-Michael

[OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg

Dear readers,

With OpenMPI, how would one go about requesting to load environment modules (of 
the http://modules.sourceforge.net/ kind) on remote nodes, augmenting those  
normally loaded there by shell dotfiles?


Background:

I run a RHEL-5/CentOS-5 cluster.  I load a bunch of default modules through 
/etc/profile.d/ and recommend to users to customize modules in ~/.bashrc.  A 
problem arises for PBS jobs which might need job-specific modules, e.g., to 
pick a specific flavor of an application.  With other MPI implementations 
(ahem) which export all (or judiciously nearly all) environment variables by 
default, you can say:

#PBS ...

module load foo # not for OpenMPI

mpirun -np 42 ... \
bar-app

Not so with OpenMPI - any such customization is only effective for processes on 
the master (=local) node of the job, and any variables changed by a given 
module would have to be specifically passed via mpirun -x VARNAME.   On the 
remote nodes, those variables are not available in the dotfiles because they 
are passed only once orted is live (after dotfile processing by the shell), 
which then immediately spawns the application binaries (right?)

I thought along the following lines:

(1) I happen to run Lustre, which would allow writing a file coherently across 
nodes prior to mpirun, and thus hook into the shell dotfile processing, but 
that seems rather crude.

(2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is not 
really general.

Is there a recommended way?


regards,
Michael

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg

Hi David,

Hmm, your demo is well-chosen and crystal-clear, yet the output is unexpected.  
I do not see environment vars passed by default here:


login3$ qsub -l nodes=2:ppn=1 -I
qsub: waiting for job 34683.mds01 to start
qsub: job 34683.mds01 ready

n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname
n102
n085
n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
n102$ export FOO=BAR
n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
FOO=BAR
n102$ type mpirun
mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun)


Curious, what do you get upon:

where mpirun


I built OpenMPI-1.3.2 here from source with:

CC=icc  CXX=icpc  FC=ifort  F77=ifort \
LDFLAGS='-Wl,-z,noexecstack' \
CFLAGS='-O2 -g -fPIC' \
CXXFLAGS='-O2 -g -fPIC' \
FFLAGS='-O2 -g -fPIC' \
./configure --prefix=$prefix \
--with-libnuma=/usr \
--with-openib=/usr \
--with-udapl \
--enable-mpirun-prefix-by-default \
--without-tm


I did't find the behavior I saw strange, given that orterun(1) talks only about 
$OPMI_* and inheritance from the remote shell.  It also mentions a "boot MCA 
module", about which I couldn't find much on open-mpi.org - hmm.


In the meantime, I did find a possible solution, namely, to tell ssh to pass a 
variable using SendEnv/AcceptEnv.  That variable is then seen by and can be 
interpreted (cautiously) in /etc/profile.d/ scripts.  A user could set it in 
the job file (or even qalter it post submission):

#PBS -v VARNAME=foo:bar:baz

For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do.


With best regards,
Michael



On Nov 17, 2009, at 4:29 , David Singleton wrote:
> 
> I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t.
> modules/environment on remote MPI tasks - we do.
> 
> xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I
> qsub: waiting for job 376366.xepbs to start
> qsub: job 376366.xepbs ready
> 
> [dbs900@x27 ~]$ module load openmpi
> [dbs900@x27 ~]$ mpirun -n 2 --bynode hostname
> x27
> x28
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
> [dbs900@x27 ~]$ setenv FOO BAR
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
> FOO=BAR
> FOO=BAR
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
> [dbs900@x27 ~]$ module load amber
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
> LOADEDMODULES=openmpi/1.3.3:amber/9
> PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin::/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
> _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
> AMBERHOME=/apps/amber/9
> LOADEDMODULES=openmpi/1.3.3:amber/9
> PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin:/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
> _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
> AMBERHOME=/apps/amber/9
> 
> David
> 
> 
> Michael Sternberg wrote:
>> Dear readers,
>> With OpenMPI, how would one go about requesting to load environment modules 
>> (of the http://modules.sourceforge.net/ kind) on remote nodes, augmenting 
>> those  normally loaded there by shell dotfiles?
>> Background:
>> I run a RHEL-5/CentOS-5 cluster.  I load a bunch of default modules through 
>> /etc/profile.d/ and recommend to users to customize modules in ~/.bashrc.  A 
>> problem arises for PBS jobs which might need job-specific modules, e.g., to 
>> pick a specific flavor of an application.  With other MPI implementations 
>> (ahem) which export all (or judiciously nearly all) environment variables by 
>> default, you can say:
>>  #PBS ...
>>  module load foo # not for OpenMPI
>>  mpirun -np 42 ... \
>>  bar-app
>> Not so with OpenMPI - any such customization is only effective for processes 
>> on the master (=local) node of the job, and any variables changed by a given 
>> module would have to be specifically passed via mpirun -x VARNAME.   On the 
>> remote nodes, those variables are not available in the dotfiles because they 
>> are passed only once orted is live (after dotfile processing by the shell), 
>> which then immediately spawns the application binaries (right?)
>> I thought along the following lines:
>> (1) I happen to run Lustre, which would allow writing a file coherently 
>> across nodes prior to mpirun, and thus hook into the shell dotfile 
>> processing, but that seems rather crude.
>> (2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is 
>> not really general.
>> Is there a recommended way?
>> regards,
>> Michael
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg

Hi,

On Nov 17, 2009, at 9:10 , Ralph Castain wrote:
> Not exactly. It completely depends on how Torque was setup - OMPI isn't 
> forwarding the environment. Torque is.

I actually tried compiling OMPI with the tm interface a couple of versions back 
for both packages but ran into memory trouble, which is why I didn't pursue 
this.  With torque-2.4.x out and OpenMPI getting close to 1.3.4 I'll try again.


> We made a design decision at the very beginning of the OMPI project not to 
> forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
> disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
> does, but not all others do.
> 
> The world is bigger than MPICH and OMPI :-)

Yup, I saw your message from just last month 
http://www.open-mpi.org/community/lists/users/2009/10/10994.php ; I didn't mean 
to make a global claim :-)  I'm aware that exporting environment variables 
(including $PWD) under MPI is implementation dependent.  I just happened to 
have MPICH, Intel MPI (same roots), and OpenMPI on my cluster.

> First, if you are using a managed environment like Torque, we recommend that 
> you work with your sys admin to decide how to configure it. This is the best 
> way to resolve a problem.

Yeah, I wish that guy would know better and not have to ask around mailing 
lists :-)


> Second, if you are not using a managed environment and/or decide not to have 
> that environment do the forwarding, you can tell OMPI to forward the envars 
> you need by specifying them via the -x cmd line option. We already have a 
> request to expand this capability, and I will be doing so as time permits. 
> One option I'll be adding is the reverse of -x - i.e., "forward all envars 
> -except- the specified one(s)".

The issue with -x is that modules may set any random variable.  The reverse 
option to -x would be great of course.  MPICH2 and Intel MPI pass all but a few 
(known to be host-specific) variables by default, and counter that with "none" 
and "all" options.


Thanks!

Michael



> HTH
> ralph
> 
> On Nov 17, 2009, at 5:55 AM, David Singleton wrote:
> 
>> 
>> I can see the difference - we built Open MPI with tm support.  For some
>> reason, I thought mpirun fed its environment to orted (after orted is
>> launched) so orted can pass it on to MPI tasks.  That should be portable
>> between different launch mechanisms.  But it looks like tm launches
>> orted with the full mpirun environment (at the request of mpirun).
>> 
>> Cheers,
>> David
>> 
>> 
>> Michael Sternberg wrote:
>>> Hi David,
>>> Hmm, your demo is well-chosen and crystal-clear, yet the output is 
>>> unexpected.  I do not see environment vars passed by default here:
>>> login3$ qsub -l nodes=2:ppn=1 -I
>>> qsub: waiting for job 34683.mds01 to start
>>> qsub: job 34683.mds01 ready
>>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname
>>> n102
>>> n085
>>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
>>> n102$ export FOO=BAR
>>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
>>> FOO=BAR
>>> n102$ type mpirun
>>> mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun)
>>> Curious, what do you get upon:
>>> where mpirun
>>> I built OpenMPI-1.3.2 here from source with:
>>>   CC=icc  CXX=icpc  FC=ifort  F77=ifort \
>>>   LDFLAGS='-Wl,-z,noexecstack' \
>>>   CFLAGS='-O2 -g -fPIC' \
>>>   CXXFLAGS='-O2 -g -fPIC' \
>>>   FFLAGS='-O2 -g -fPIC' \
>>>   ./configure --prefix=$prefix \
>>>   --with-libnuma=/usr \
>>>   --with-openib=/usr \
>>>   --with-udapl \
>>>   --enable-mpirun-prefix-by-default \
>>>   --without-tm
>>> I did't find the behavior I saw strange, given that orterun(1) talks only 
>>> about $OPMI_* and inheritance from the remote shell.  It also mentions a 
>>> "boot MCA module", about which I couldn't find much on open-mpi.org - hmm.
>>> In the meantime, I did find a possible solution, namely, to tell ssh to 
>>> pass a variable using SendEnv/AcceptEnv.  That variable is then seen by and 
>>> can be interpreted (cautiously) in /etc/profile.d/ scripts.  A user could 
>>> set it in the job file (or even qalter it post submission):
>>> #PBS -v VARNAME=foo:bar:baz
>>> For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do.
>>> With best regards,
>>> Michael
>>> On Nov 17, 2009, at 4

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg


On Nov 17, 2009, at 10:17 , Michael Sternberg wrote:

On Nov 17, 2009, at 9:10 , Ralph Castain wrote:
Not exactly. It completely depends on how Torque was setup - OMPI  
isn't forwarding the environment. Torque is.


I actually tried compiling OMPI with the tm interface a couple of  
versions back for both packages but ran into memory trouble, which  
is why I didn't pursue this.  With torque-2.4.x out and OpenMPI  
getting close to 1.3.4 I'll try again.


Follow-up:  I recompiled OpenMPI-1.3.2 "--with-tm" (from torque-2.3.6)  
and, lo and behold, environment variables and modules now are passed  
across nodes, which thus includes custom modules loaded in the job  
file.   Darn, that was an old hang-up!


The variables passed do include (unsurprisingly) $HOSTNAME, but I can  
live with that:


login4 $ qsub -l nodes=2:ppn=1 -I
qsub: waiting for job 34717.mds01 to start
qsub: job 34717.mds01 ready

n102 $ mpirun hostname
n102
n091
n102 $ mpirun env | grep HOSTNAME
HOSTNAME=n102
HOSTNAME=n102

Ralph, David - thank you for the pointers!


Michael

1 2 3 4 >

1 - 100 of 389 matches

Mail list logo