Re: [OMPI users] automatically creating a machinefile

2012-07-05 Thread Reuti
Am 05.07.2012 um 00:38 schrieb Dominik Goeddeke:

> no idea of Rocks, but with PBS and SLURM, I always do this directly in the 
> job submission script. Below is an example of an admittedly spaghetti-code 
> script that   does this -- assuming proper (un)commenting --  for PBS and 
> SLURM

For SLURM, Torque and GridEngine I would suggest to use the builtin support in 
Open MPI and MPICH2 directly. So there is no need to build a machinefile from 
the given list of selected nodes. In addition it will also provide a tight 
integration of the parallel job into the queuing system and jobs can nicely be 
removed.

Is there any reason to bypass this mechanism?

-- Reuti


>  and OpenMPI and MPICH2, for one particular machine that I have been toying 
> around with lately ... 
> 
> Dominik
> 
> #!/bin/bash
> 
>  PBS
> #PBS -N feast
> #PBS -l nodes=25:ppn=2
> #PBS -q batch
> #PBS -l walltime=2:00:00
> #job should not rerun if it fails
> #PBS -r n
> 
> ### SLURM
> # @ job_name = feaststrong1
> # @ initialdir = .
> # @ output = feaststrong1_%j.out
> # @ error = feaststrong1_%j.err
> # @ total_tasks = 50
> # @ cpus_per_task = 1
> # @ wall_clock_limit = 2:00:00
> 
> # modules
> module purge
> module load gcc/4.6.2
> module load openmpi/1.5.4
> #module load mpich2/1.4.1
> 
> # cd into wdir
> cd $HOME/feast/feast/feast/applications/poisson_coproc
> 
> 
> # PBS with MPICH2
> # create machine files to isolate the master process
> #cat $PBS_NODEFILE > nodes.txt
> ## extract slaves
> #sort -u  nodes.txt > temp.txt
> #lines=`wc -l temp.txt | awk '{print $1}'`
> #((lines=$lines - 1))
> #tail -n $lines temp.txt > slavetemp.txt
> #cat slavetemp.txt | awk '{print $0 ":2"}' > slaves.txt
> ## extract master
> #head -n 1 temp.txt > mastertemp.txt
> #cat mastertemp.txt | awk '{print $0 ":1"}' > master.txt
> ## merge into one dual nodefile
> #cat master.txt > dual.hostfile
> #cat slaves.txt >> dual.hostfile 
> ## same for single hostfile
> #tail -n $lines temp.txt > slavetemp.txt
> #cat slavetemp.txt | awk '{print $0 ":1"}' > slaves.txt
> ## extract master
> #head -n 1 temp.txt > mastertemp.txt
> #cat mastertemp.txt | awk '{print $0 ":1"}' > master.txt
> ## merge into one single nodefile
> #cat master.txt > single.hostfile
> #cat slaves.txt >> single.hostfile
> ## and clean up
> #rm -f slavetemp.txt mastertemp.txt master.txt slaves.txt temp.txt nodes.txt
> 
> # 4 nodes
> #mpiexec -n 7 -f dual.hostfile ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np007.dat
> #mkdir arm-strongscaling-series1-L8-nodes04
> #mv feastlog.* arm-strongscaling-series1-L8-nodes04
> 
> # 7 nodes
> #mpiexec -n 13 -f dual.hostfile ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np013.dat
> #mkdir arm-strongscaling-series1-L8-nodes07
> #mv feastlog.* arm-strongscaling-series1-L8-nodes07
> 
> # 13 nodes
> #mpiexec -n 25 -f dual.hostfile ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np025.dat
> #mkdir arm-strongscaling-series1-L8-nodes13
> #mv feastlog.* arm-strongscaling-series1-L8-nodes13
> 
> # 25 nodes
> #mpiexec -n 49 -f dual.hostfile ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np049.dat
> #mkdir arm-strongscaling-series1-L8-nodes25
> #mv feastlog.* arm-strongscaling-series1-L8-nodes25
> 
> 
> ## SLURM
> 
> # figure out which nodes we got
> srun /bin/hostname | sort > availhosts3.txt
> 
> lines=`wc -l availhosts3.txt | awk '{print $1}'`
> ((lines=$lines - 2))
> tail -n $lines availhosts3.txt > slaves3.txt
> head -n 1 availhosts3.txt > master3.txt
> cat master3.txt > hostfile3.txt
> cat slaves3.txt >> hostfile3.txt
> # DGDG: SLURM -m arbitrary not supported by OpenMPI
> #export SLURM_HOSTFILE=./hostfile3.txt
> 
> 
> # 4 nodes
> #mpirun -np 7 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi 
> master.dat.strongscaling.m6.L8.np007.dat
> mpirun -np 7 --hostfile hostfile3.txt ./feastgpu-ompi 
> master.dat.strongscaling.m6.L8.np007.dat
> #mpiexec -n 7 -f dual.hostfile ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np007.dat
> #srun -n 7 -m arbitrary ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np007.dat
> mkdir arm-strongscaling-series1-L8-nodes04
> mv feastlog.* arm-strongscaling-series1-L8-nodes04
> 
> # 7 nodes
> #mpirun -np 13 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi 
> master.dat.strongscaling.m6.L8.np013.dat
> mpirun -np 13 --hostfile hostfile3.txt ./feastgpu-ompi 
> master.dat.strongscaling.m6.L8.np013.dat
> #mpiexec -n 13 -f dual.hostfile ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np013.dat
> #srun -n 13 -m arbitrary ./feastgpu-mpich2 
> master.dat.strongscaling.m6.L8.np013.dat
> mkdir arm-strongscaling-series1-L8-nodes07
> mv feastlog.* arm-strongscaling-series1-L8-nodes07
> 
> # 13 nodes
> #mpirun -np 25 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi 
> master.dat.strongscaling.m6.L8.np025.dat
> mpirun -np 25 --hostfile hostfile3.txt ./feastgpu-ompi 
> master.dat.strongscaling.m6.L8.np025.dat
> #mpiexec -n 25 -f dual.hostfile ./feastgpu-mpich2 
> mast

Re: [OMPI users] Getting MPI to access processes on a 2nd computer.

2012-07-05 Thread VimalMathew
Hi Shiqing,

 

We went through the steps mentioned in the links to modify DCOM and COM
settings.

wmic /node:remote_node_ip process call create notepad.exe is able to
create a notepad process remotely, but I'm getting the same error
message as before using mpirun -np 2 -host host1 host2 notepad.exe.

 

I'm running this on two Windows 7 machines both of which I have admin
rights on.

Any suggestions?

 

Thanks,

Vimal 

 

From: Shiqing Fan [mailto:f...@hlrs.de] 
Sent: Wednesday, July 04, 2012 5:28 AM
To: Open MPI Users
Cc: Mathew, Vimal
Subject: Re: [OMPI users] Getting MPI to access processes on a 2nd
computer.

 

Hi,

The Open MPI potentially uses WMI to launch remote processes, so the WMI
has to be configured correctly. There are two links talking about how to
set it up in README.WINDOWS file:

http://msdn.microsoft.com/en-us/library/aa393266(VS.85).aspx
 
http://community.spiceworks.com/topic/578

For testing whether it works or not, you can use following command:
wmic /node:remote_node_ip process call create notepad.exe

then log onto the other Windows, check in the task manager if the
notepad.exe process is created (don't forget to delete it afterwards). 

If that works, this command will also work:
mpirun -np 2 -host host1 host2 notepad.exe

Please try to run the above two test commands, if they all works you
application should also work. Just let me know if you have any question
or trouble with that.


Shiqing

On 2012-07-03 8:53 PM, vimalmat...@eaton.com wrote:

Hi,

 

I'm trying to run an MPI code using processes on a remote
machine.

I've connected the 2 machines using a crossover cable and they
are communicating with each other(I'm getting ping replies and I can
access drives on one another).

 

When I run mpiexec -host system_name MPI_Test.exe, I get the
following error:

 

C:\OpenMPI\openmpi-1.6\build\Debug>mpiexec -host SOUMIWHP4500449
MPI_Test.exe

connecting to SOUMIWHP4500449

username:C9995799

password:**

Save Credential?(Y/N) N

[SOUMIWHP5003567:01728] Could not connect to namespace cimv2 on
node SOUMIWHP450

0449. Error code =-2147023174



--

mpiexec was unable to start the specified application as it
encountered an error

.

More information may be available above.



--

[SOUMIWHP5003567:01728] [[38316,0],0] ORTE_ERROR_LOG: A message
is attempting to

be sent to a process whose contact information is unknown in
file ..\..\..\open

mpi-1.6\orte\mca\rml\oob\rml_oob_send.c at line 145

[SOUMIWHP5003567:01728] [[38316,0],0] attempted to send to
[[38316,0],1]: tag 1

[SOUMIWHP5003567:01728] [[38316,0],0] ORTE_ERROR_LOG: A message
is attempting to

be sent to a process whose contact information is unknown in
file ..\..\..\open

mpi-1.6\orte\orted\orted_comm.c at line 126

 

Could anyone tell me what I'm missing?

 

I've configured MPI on VS Express 2010 and I'm able to run MPI
programs on one system.

On the other computer, I pasted the MPI_Test.exe file in the
same location as the calling computer.

 

Thanks,
Vimal

 






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






-- 
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de


Re: [OMPI users] ompi mca mxm version

2012-07-05 Thread SLIM H.A.
Hi

Do you have any details about the performance of mxm, e.g. for real 
applications?

Thanks

Henk

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Mike Dubman
Sent: 11 May 2012 19:23
To: Open MPI Users
Subject: Re: [OMPI users] ompi mca mxm version

ob1/openib is RC based which have scalability issues, mxm 1.1 is ud based and 
kicks in at scale.
We observe mxm outperforms ob1 on 8+ nodes.

We will update docs as you mentioned, thanks

Regards




On Thu, May 10, 2012 at 4:30 PM, Derek Gerstmann 
mailto:derek.gerstm...@uwa.edu.au>> wrote:
On May 9, 2012, at 7:41 PM, Mike Dubman wrote:

> you need latest OMPI 1.6.x and latest MXM 
> (ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar)
Excellent!  Thanks for the quick response!  Using the MXM v1.1.1067 against 
OMPI v1.6.x did the trick.  Please (!!!) add a note to the docs for OMPI 1.6.x 
to help out other users -- there's zero mention of this anywhere that I could 
find from scouring the archives and source code.

Sadly, performance isn't what we'd expect.  OB1 is outperforming CM MXM 
(consistently).

Are there any suggested configuration settings?  We tried all the obvious ones 
listed in the OMPI Wiki and mailing list archives, but few have had much of an 
effect.

We seem to do better with the OB1 openib btl, than the lower level CM MXM.  Any 
suggestions?

Here's numbers from the OSU MicroBenchmarks (for the MBW_MR test) running on 2x 
pairs, aka 4 separate hosts, each using Mellanox ConnectX, one card per host, 
single port, single switch):

-- OB1
> /opt/openmpi/1.6.0/bin/mpiexec -np 4 --mca pml ob1 --mca btl ^tcp --mca 
> mpi_use_pinned 1 -hostfile all_hosts ./osu-micro-benchmarks/osu_mbw_mr
# OSU MPI Multiple Bandwidth / Message Rate Test v3.6
# [ pairs: 2 ] [ window size: 64 ]
# Size  MB/sMessages/s
1   2.912909711.73
2   5.972984274.11
4  11.702924292.78
8  23.002874502.93
16 44.752796639.64
32 89.492796639.64
64175.982749658.96
128   292.412284459.86
256   527.842061874.61
512   961.651878221.77
1024 1669.061629943.87
2048 2220.431084193.45
4096 2906.57 709611.68
8192 3017.65 368365.70
163845225.97 318967.95
327685418.98 165374.23
655365998.07  91523.27
131072   6031.69  46018.16
262144   6063.38  23129.97
524288   5971.77  11390.24
1048576  5788.75   5520.59
2097152  5791.39   2761.55
4194304  5820.60   1387.74

-- MXM
> /opt/openmpi/1.6.0/bin/mpiexec -np 4 --mca pml cm --mca mtl mxm --mca btl 
> ^tcp --mca mpi_use_pinned 1 -hostfile all_hosts 
> ./osu-micro-benchmarks/osu_mbw_mr
# OSU MPI Multiple Bandwidth / Message Rate Test v3.6
# [ pairs: 2 ] [ window size: 64 ]
# Size  MB/sMessages/s
1   2.072074863.43
2   4.142067830.81
4  10.572642471.39
8  23.162895275.37
16 38.732420627.22
32 66.772086718.41
64147.872310414.05
128   284.942226109.85
256   537.272098709.64
512  1041.912034989.43
1024 1930.931885676.34
2048 1998.68 975916.00
4096 2880.72 703299.77
8192 3608.45 440484.17
163844027.15 245797.51
327684464.85 136256.47
655364594.22  70102.23
131072   4655.62  35519.55
262144   4671.56  17820.58
524288   4604.16   8781.74
1048576  4635.51   4420.77
2097152  3575.17   1704.78
4194304  2828.19674.29

Thanks!

-[dg]

Derek Gerstmann, PhD Student
The University of Western Australia (UWA)

w: http://local.ivec.uwa.edu.au/~derek
e: derek.gerstmann [at] icrar.org
On May 9, 2012, at 7:41 PM, Mike Dubman wrote:

> you need latest OMPI 1.6.x and latest MXM 
> (ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar)
>
>
>
> On Wed, May 9, 2012 at 6:02 AM, Derek Gerstmann 
> mailto:derek.gerstm...@uwa.edu.au>> wrote:
> What versions of OpenMPI and the Mellanox MXM libraries have been tested and 
> verified to work?
>
> We are currently trying to build OpenMPI v1.5.5 against the MXM 1.0.601 
> (included in the MLNX_OFED_LINUX-1.5.3-3.0.0 distribution) an

Re: [OMPI users] automatically creating a machinefile

2012-07-05 Thread Gus Correa

Hi Erin

You should follow Dominik's and Reuti's suggestions,
and use the resource manager [Torque, Slurm, SGE]
built-in support for OpenMPI [and MPICH2 if you want]

Which resource manager is installed in your Rocks cluster,
depends on how it was built.
Rocks can be built with either SGE or Torque, and maybe not
so easily with Slurm as well.
You may need to ask the system administrator or
whoever built/knows the cluster.

However, 'man qsub' may give you a hint [will show PBS if you
have Torque/PBS and probably SGE if you have it.].

We have Torque here, so my answers are focused on Torque/PBS,
but there are equivalent workarounds for SGE, I guess.

***

My recollection is that the OpenMPI that comes native with
Rocks is *not* built with either SGE or Torque support.
Hence, it won't pick up the nodes' file that the resource manager
allocated to your job and use it as a machinefile, which is
what you probably want.

***

If you're using Torque, a workaround
with the native Rocks OpenMPI
is to use the $PBS_NODEFILE file as your machine file,
e.g., inside your job submission script:

cd $PBS_O_WORKDIR # this is to get to the work directory

mpiexec -np 32 -hostfile $PBS_NODEFILE ./my_mpi_program

***

A notch up, is to install an alternative built of
OpenMPI in your area, ensuring Torque or SGE support.
This is as easy as 'configure;make;install', as long as
you use the right flags to configure:

Download the source code:

http://www.open-mpi.org/software/ompi/v1.6/

You can use gcc,g++,gfotran, to build OpenMPI, if installed
in your cluster, as in the example below, or other compilers.

$ cd $HOME/Downloads

$ tar -jxvf openmpi-1.6.tar.bz2

$ ./configure --prefix=$HOME/openmpi-1.6.0 CC=gcc CXX=g++ F77=gfortran 
FC=gfortran


If you have Torque, add this option to the command line above,
to get native Torque support:

--with-tm=/path/to/torque # wherever libtorque is installed

There is a similar option to build with SGE support, if you
have SGE, just do ./configure --help to see all options.

Also, if you have infiniband, and if it is installed in a
non-standard location, to build with infiniband support you
need to add this other option to the configure command line:

--with-openib=/path/to/openib   # wherever librdma and libverbs are 
installed


Then do:

$ make

$ make install

**

Check the README file to see if there were recent changes in
the configure options, please.

Do ./configure --help to see all options.

The FAQ are the actual OpenMPI documentation:

http://www.open-mpi.org/faq/

I hope this helps,
Gus Correa

On 07/04/2012 06:10 PM, Hodgess, Erin wrote:

Dear MPI people:

Is there a way (a script) available to automatically generate a
machinefile, please?

This would be on Rocks.

ompi_info -v ompi full --parsable
package:Open MPI r...@vi-1.rocksclusters.org Distribution
ompi:version:full:1.3.2
ompi:version:svn:r21054
ompi:version:release_date:Apr 21, 2009
orte:version:full:1.3.2
orte:version:svn:r21054
orte:version:release_date:Apr 21, 2009
opal:version:full:1.3.2
opal:version:svn:r21054
opal:version:release_date:Apr 21, 2009
ident:1.3.2

Thanks,
Erin



Erin M. Hodgess, PhD
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: hodge...@uhd.edu



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] fortran program with integer kind=8 using openmpi?

2012-07-05 Thread Jeff Squyres
On Jul 3, 2012, at 8:12 PM, Steve Kargl wrote:

>> Thank you for all responses. There is another problem using
>> -fdefault-integer-8.
> 
> I'll make the unsolicited suggestion that you really
> really really don't want to use the -fdefault-integer-8
> option.  It would be far better to actually audit your
> Fortran code and use Fortran's kind type mechanism to
> choose the appropriate kinds.

This is probably true.

> I think that you're just hitting the tip of the iceberg
> with problems and potential nasty bugs.

Agreed: there are likely to be more bugs than just this one.

I did file https://svn.open-mpi.org/trac/ompi/ticket/3163 about this problem, 
though, and will look into it.  But possibly not until next week...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/