[O-MPI users] Wrapper names [was: shell interaction]

2005-06-15 Thread Bogdan Costescu

On Tue, 14 Jun 2005, Brian Barrett wrote:


It would be nice if the c++ compiler wrapper were
installed under mpicxx, mpiCC, and mpic++ instead of
just the latter 2.


Yeah, we can do that, no problem.


Sorry for the silly question, but is there any kind of document or 
formal recommendation regarding the naming of these wrappers ?


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [O-MPI users] Wrapper names [was: shell interaction]

2005-06-16 Thread Bogdan Costescu
On Wed, 15 Jun 2005, Brian Barrett wrote:

> Make any sense?

Makes a lot of sense. Thank you !

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de



Re: [O-MPI users] error creating high priority cq for mthca0

2005-12-07 Thread Bogdan Costescu

On Tue, 6 Dec 2005, Jeff Squyres wrote:

With Tim's response to this -- I'm curious (so that we get correct 
information on the FAQ) -- is the /etc/security/limit.conf method a 
system-wide way of setting these values, and "ulimit -l" a per-user 
way of setting it?


You can see the official docs, including examples, at:

http://www.kernel.org/pub/linux/libs/pam/Linux-PAM-html/pam-6.html#ss6.12

You can look at the limits.conf way as a way to set a default (for 
soft) and a maximum (for hard) per session - these values are usually 
static, but if the admin (or some program with enough priviledges) 
modifies them, only the shells started afterwards will get the new 
values, the running shells keep their current values.


So it's similar to the system-wide shell environment settings (f.e. 
/etc/csh.cshrc) which only take effect in the shells started after the 
modifications were made. It's also disimilar to writing to /proc/sys/ 
which usually affects all running processes.



That doesn't sound quite right to me -- I'm assuming that a user
can't "ulimit -l X" where X is larger than the numbers in /etc/
security/limits.conf -- can someone confirm if this is Right?


... can't "ulimit -l X" where X is larger than the "hard" value from 
limits.conf.


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [O-MPI users] error creating high priority cq for mthca0

2005-12-07 Thread Bogdan Costescu


[ Sorry, I pressed the wrong keys too soon... ]

On Wed, 7 Dec 2005, Bogdan Costescu wrote:


... can't "ulimit -l X" where X is larger than the "hard" value from
limits.conf.


This is true for a normal user; however root can modify the hard 
limits in its shells. Also a batch system, running as root on a 
compute node, can set the hard and soft values to something different 
from the system (limits.conf) values before starting the user shell or 
compute process, via setrlimit(2).


Is the limits.conf issue specific to Linux ? Aren't there other OSes 
that use PAM and would be affected as well ?


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] Shared-memory problems

2011-11-03 Thread Bogdan Costescu
On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L  wrote:
> -    /dev/shm is 12 GB and has 755 permissions
> ...
> % ls –l output:
>
> drwxr-xr-x  2 root root 40 Oct 28 09:14 shm

This is your problem: it should be something like drwxrwxrwt. It might
depend on the distribution, f.e. the following show this to be a bug:

https://bugzilla.redhat.com/show_bug.cgi?id=533897
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=317329

and surely you can find some more on the subject with your favorite
search engine. Another source could be a paranoid sysadmin who has
changed the default (most likely correct) setting the distribution
came with - not only OpenMPI but any application using shmem would be
affected..

Cheers,
Bogdan



Re: [OMPI users] Qlogic & openmpi

2011-12-05 Thread Bogdan Costescu
On Mon, Dec 5, 2011 at 16:12, Ralph Castain  wrote:
> Sounds like we should be setting this value when starting the process - yes?
> If so, what is the "good" value, and how do we compute it?

I've also been just looking at this for the past few days. What I came
up with is a small script psm_shctx which sets the envvar then execs
the MPI binary and is inserted between mpirun and the MPI binary:

mpirun psm_shctx my_mpi_app

Of course the same effect can be obtained if the orted would set the
envvar before starting the process. There is however a problem:
deciding how many contexts to use. For max. performance, one should
use a ratio of 1:1 between MPI ranks and contexts; the highest ratio
possible (but with lowest performance) is 4 MPI ranks per context;
another restriction is that each job should have at least 1 context.

F.e. on AMD cluster nodes with 4 CPUs of 12 cores (so total of 48
cores) one gets 16 contexts; assigning all 16 contexts to 48 ranks
would mean a ratio of 1:3 but this can only apply if allocation of
cores is done in multiples of 4; with a less advantageous allocation
strategy more contexts are lost due to rounding up. At the extreme, if
there's only one rank per job, there can only be maximum 16 jobs -
using all 16 contexts and the rest of 32 cores have to remain idle or
be used for other jobs that don't require communication over
InfiniPath.

There is a further issue though: MPI-2 dynamic creation of processes -
if it's not known how many ranks there will be, I guess one should use
the highest context sharing ratio (1:4) to be on the safe side.

I've found a mention of this envvar being handled in the changelog for
MVAPICH2 1.4.1 - maybe that can serve as source of inspiration ? (but
I haven't looked at it...)

Hope this helps,
Bogdan


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-13 Thread Bogdan Costescu
On Thu, Jan 12, 2012 at 16:10, Jeff Squyres  wrote:
> It's very strange to me that Open MPI is getting *better* than raw TCP 
> performance.  I don't have an immediate explanation for that -- if you're 
> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe 
> and the others.

Could it be that the difference is not in the transfer but in the timing ?

Cheers,
Bogdan



Re: [OMPI users] Network connection check

2009-07-23 Thread Bogdan Costescu

On Thu, 23 Jul 2009, vipin kumar wrote:


1:  Slave machine is reachable or not,  (How I will do that ??? Given - I
have IP address and Host Name of Slave machine.)

2:  if reachable, check whether program(orted and "slaveprocess") is alive
or not.


You don't specify and based on your description I infer that you are 
not using a batch/queueing system, but just a rsh/ssh based start-up 
mechanism. A batch/queueing system might be able to tell you whether a 
remote computer is still accessible.


I think that MPI is not the proper mechanism to achieve what you want. 
PVM or, maybe better, direct socket programming will probably serve 
you more.


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Bogdan Costescu


Brett Pemberton  wrote:

[[1176,1],0][btl_openib_component.c:2905:handle_wc] from 
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY 
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0


I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with 
all versions of OpenMPI that I have tried (1.2.x and pre-1.3) and some 
MVAPICH versions, from which I have concluded that the problem lies in 
the lower levels (OFED or IB card firmware). Indeed after the 
installation of OFED 1.3.x and a possible firmware update (not sure 
about the firmware as I don't admin that cluster), these errors have 
disappeared.


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] mpirun hangs when launching job on remote node

2009-03-18 Thread Bogdan Costescu

On Wed, 18 Mar 2009, Raymond Wan wrote:


Perhaps it has something to do with RH's defaults for the firewall settings?


If your sysadmin uses kickstart to configure the systems, (s)he has to 
add 'firewall --disabled'; similar for SELinux which seems to have 
caused problems to another person on this list. OTOH, if (s)he blindly 
copied the config for a workstation to a cluster node, maybe some more 
education is needed first...



Another system that worked "immediately" was a Debian system.


That's because Debian doesn't configure a firewall or SELinux, leaving 
the admin the responsability to do it.


Anyway, if you find out a solution that doesn't require the firewall 
to be turned off, please let me know -- I think our sysadmin would 
be interested, too.


Depending on your definition of 'firewall turned off', the new feature 
of restricting ports used by OpenMPI will help. The firewall can stay 
on, but it should be configured to open a range of ports used by 
OpenMPI.


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem

2009-03-31 Thread Bogdan Costescu

On Tue, 31 Mar 2009, Jeff Squyres wrote:


UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')`


Not sure what you want to achieve here... 'uname -X' is valid on 
Solaris, but not on Linux. The OP has indicated already that he is 
running this on Linux (SLES) so the above line is supposed to fail.


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem

2009-03-31 Thread Bogdan Costescu

On Tue, 31 Mar 2009, Bogdan Costescu wrote:


'uname -X' is valid on Solaris, but not on Linux.


Not good to reply to oneself, but I've looked at the archives and 
realized that 'uname -X' comes from a message of the OP. My guess is 
that the same source directory was used to build for Solaris 
previously (maybe on shared NFS ?) and some state is being picked by a 
new ./configure run who then decides to treat the system as Solaris. 
So unpacking the archive again and starting building from scratch 
might be a good idea...


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX

2009-05-04 Thread Bogdan Costescu

On Mon, 4 May 2009, Ricardo Fern�ndez-Perea wrote:


any idea where I should look for the cause.


Can you try adding to the mpirun/mpiexec command line '--mca mtl 
mx --mca pml cm' to specify usage of the non-default MX MTL ? (sorry 
if you already do, I haven't found it in your e-mail)


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de

Re: [OMPI users] Best way to overlap computation and transfer using MPI over TCP/Ethernet?

2009-06-04 Thread Bogdan Costescu

On Thu, 4 Jun 2009, Lars Andersson wrote:

I've been trying to get overlapping computation and data transfer to 
work, without much success so far.


If this is so important to you, why do you insist in using Ethernet 
and not a more HPC-oriented interconnect which can make progress in 
the background ?


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] mpirun hangs

2006-02-24 Thread Bogdan Costescu

On Thu, 23 Feb 2006, Emanuel Ziegler wrote:


Unfortunately, I don't know what errno=113 means, but obviously it's a
TCP problem.



From /usr/include/asm/errno.h:


#define EHOSTUNREACH113 /* No route to host */

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] mpirun hangs

2006-02-24 Thread Bogdan Costescu

On Fri, 24 Feb 2006, Emanuel Ziegler wrote:

So "No rout to host" means that the TCP package could not be sent 
(usually host down, broken routing table, network interface down, 
...). But it's 'ping'able and even rsh works fine.


... or some packet filtering is enabled. Check with 'iptables -L -n' 
run as root. Open MPI and port-based blocking are not compatible, 
search the archives of either this list or the LAM/MPI users list for 
discussions.


BTW, /etc/hosts.allow says "ALL : ALL", so there should be no 
trouble. Do I have to modify /etc/securetty in order to allow 
orterun to access the machines or is the rsh/rlogin entry 
sufficient?


If running commands on remote nodes with rsh is already working (as 
you showed in the first message), there shouldn't be any additional 
settings needed.


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] Open-MPI and TCP port range

2006-04-20 Thread Bogdan Costescu
On Thu, 20 Apr 2006, Jeff Squyres (jsquyres) wrote:

> Right now, there is no way to restrict the port range that Open MPI
> will use. ... If this becomes a problem for you (i.e., the random
> MPI-chose-the-same-port-as-your-app events happen a lot), let us
> know and we can probably put in some controls to work around this.

I would welcome a discussion about this; on the LAM/MPI lists several
people asked for a limited port range to allow them to pass through
firewalls or to do tunelling.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de