[O-MPI users] Wrapper names [was: shell interaction]
On Tue, 14 Jun 2005, Brian Barrett wrote: It would be nice if the c++ compiler wrapper were installed under mpicxx, mpiCC, and mpic++ instead of just the latter 2. Yeah, we can do that, no problem. Sorry for the silly question, but is there any kind of document or formal recommendation regarding the naming of these wrappers ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [O-MPI users] Wrapper names [was: shell interaction]
On Wed, 15 Jun 2005, Brian Barrett wrote: > Make any sense? Makes a lot of sense. Thank you ! -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [O-MPI users] error creating high priority cq for mthca0
On Tue, 6 Dec 2005, Jeff Squyres wrote: With Tim's response to this -- I'm curious (so that we get correct information on the FAQ) -- is the /etc/security/limit.conf method a system-wide way of setting these values, and "ulimit -l" a per-user way of setting it? You can see the official docs, including examples, at: http://www.kernel.org/pub/linux/libs/pam/Linux-PAM-html/pam-6.html#ss6.12 You can look at the limits.conf way as a way to set a default (for soft) and a maximum (for hard) per session - these values are usually static, but if the admin (or some program with enough priviledges) modifies them, only the shells started afterwards will get the new values, the running shells keep their current values. So it's similar to the system-wide shell environment settings (f.e. /etc/csh.cshrc) which only take effect in the shells started after the modifications were made. It's also disimilar to writing to /proc/sys/ which usually affects all running processes. That doesn't sound quite right to me -- I'm assuming that a user can't "ulimit -l X" where X is larger than the numbers in /etc/ security/limits.conf -- can someone confirm if this is Right? ... can't "ulimit -l X" where X is larger than the "hard" value from limits.conf. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [O-MPI users] error creating high priority cq for mthca0
[ Sorry, I pressed the wrong keys too soon... ] On Wed, 7 Dec 2005, Bogdan Costescu wrote: ... can't "ulimit -l X" where X is larger than the "hard" value from limits.conf. This is true for a normal user; however root can modify the hard limits in its shells. Also a batch system, running as root on a compute node, can set the hard and soft values to something different from the system (limits.conf) values before starting the user shell or compute process, via setrlimit(2). Is the limits.conf issue specific to Linux ? Aren't there other OSes that use PAM and would be affected as well ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Shared-memory problems
On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L wrote: > - /dev/shm is 12 GB and has 755 permissions > ... > % ls –l output: > > drwxr-xr-x 2 root root 40 Oct 28 09:14 shm This is your problem: it should be something like drwxrwxrwt. It might depend on the distribution, f.e. the following show this to be a bug: https://bugzilla.redhat.com/show_bug.cgi?id=533897 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=317329 and surely you can find some more on the subject with your favorite search engine. Another source could be a paranoid sysadmin who has changed the default (most likely correct) setting the distribution came with - not only OpenMPI but any application using shmem would be affected.. Cheers, Bogdan
Re: [OMPI users] Qlogic & openmpi
On Mon, Dec 5, 2011 at 16:12, Ralph Castain wrote: > Sounds like we should be setting this value when starting the process - yes? > If so, what is the "good" value, and how do we compute it? I've also been just looking at this for the past few days. What I came up with is a small script psm_shctx which sets the envvar then execs the MPI binary and is inserted between mpirun and the MPI binary: mpirun psm_shctx my_mpi_app Of course the same effect can be obtained if the orted would set the envvar before starting the process. There is however a problem: deciding how many contexts to use. For max. performance, one should use a ratio of 1:1 between MPI ranks and contexts; the highest ratio possible (but with lowest performance) is 4 MPI ranks per context; another restriction is that each job should have at least 1 context. F.e. on AMD cluster nodes with 4 CPUs of 12 cores (so total of 48 cores) one gets 16 contexts; assigning all 16 contexts to 48 ranks would mean a ratio of 1:3 but this can only apply if allocation of cores is done in multiples of 4; with a less advantageous allocation strategy more contexts are lost due to rounding up. At the extreme, if there's only one rank per job, there can only be maximum 16 jobs - using all 16 contexts and the rest of 32 cores have to remain idle or be used for other jobs that don't require communication over InfiniPath. There is a further issue though: MPI-2 dynamic creation of processes - if it's not known how many ranks there will be, I guess one should use the highest context sharing ratio (1:4) to be on the safe side. I've found a mention of this envvar being handled in the changelog for MVAPICH2 1.4.1 - maybe that can serve as source of inspiration ? (but I haven't looked at it...) Hope this helps, Bogdan
Re: [OMPI users] Strange TCP latency results on Amazon EC2
On Thu, Jan 12, 2012 at 16:10, Jeff Squyres wrote: > It's very strange to me that Open MPI is getting *better* than raw TCP > performance. I don't have an immediate explanation for that -- if you're > using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe > and the others. Could it be that the difference is not in the transfer but in the timing ? Cheers, Bogdan
Re: [OMPI users] Network connection check
On Thu, 23 Jul 2009, vipin kumar wrote: 1: Slave machine is reachable or not, (How I will do that ??? Given - I have IP address and Host Name of Slave machine.) 2: if reachable, check whether program(orted and "slaveprocess") is alive or not. You don't specify and based on your description I infer that you are not using a batch/queueing system, but just a rsh/ssh based start-up mechanism. A batch/queueing system might be able to tell you whether a remote computer is still accessible. I think that MPI is not the proper mechanism to achieve what you want. PVM or, maybe better, direct socket programming will probably serve you more. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] openib RETRY EXCEEDED ERROR
Brett Pemberton wrote: [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0 I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with all versions of OpenMPI that I have tried (1.2.x and pre-1.3) and some MVAPICH versions, from which I have concluded that the problem lies in the lower levels (OFED or IB card firmware). Indeed after the installation of OFED 1.3.x and a possible firmware update (not sure about the firmware as I don't admin that cluster), these errors have disappeared. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] mpirun hangs when launching job on remote node
On Wed, 18 Mar 2009, Raymond Wan wrote: Perhaps it has something to do with RH's defaults for the firewall settings? If your sysadmin uses kickstart to configure the systems, (s)he has to add 'firewall --disabled'; similar for SELinux which seems to have caused problems to another person on this list. OTOH, if (s)he blindly copied the config for a workstation to a cluster node, maybe some more education is needed first... Another system that worked "immediately" was a Debian system. That's because Debian doesn't configure a firewall or SELinux, leaving the admin the responsability to do it. Anyway, if you find out a solution that doesn't require the firewall to be turned off, please let me know -- I think our sysadmin would be interested, too. Depending on your definition of 'firewall turned off', the new feature of restricting ports used by OpenMPI will help. The firewall can stay on, but it should be configured to open a range of ports used by OpenMPI. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem
On Tue, 31 Mar 2009, Jeff Squyres wrote: UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')` Not sure what you want to achieve here... 'uname -X' is valid on Solaris, but not on Linux. The OP has indicated already that he is running this on Linux (SLES) so the above line is supposed to fail. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem
On Tue, 31 Mar 2009, Bogdan Costescu wrote: 'uname -X' is valid on Solaris, but not on Linux. Not good to reply to oneself, but I've looked at the archives and realized that 'uname -X' comes from a message of the OP. My guess is that the same source directory was used to build for Solaris previously (maybe on shared NFS ?) and some state is being picked by a new ./configure run who then decides to treat the system as Solaris. So unpacking the archive again and starting building from scratch might be a good idea... -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Myrinet optimization with OMP1.3 and macosX
On Mon, 4 May 2009, Ricardo Fern�ndez-Perea wrote: any idea where I should look for the cause. Can you try adding to the mpirun/mpiexec command line '--mca mtl mx --mca pml cm' to specify usage of the non-default MX MTL ? (sorry if you already do, I haven't found it in your e-mail) -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Best way to overlap computation and transfer using MPI over TCP/Ethernet?
On Thu, 4 Jun 2009, Lars Andersson wrote: I've been trying to get overlapping computation and data transfer to work, without much success so far. If this is so important to you, why do you insist in using Ethernet and not a more HPC-oriented interconnect which can make progress in the background ? -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] mpirun hangs
On Thu, 23 Feb 2006, Emanuel Ziegler wrote: Unfortunately, I don't know what errno=113 means, but obviously it's a TCP problem. From /usr/include/asm/errno.h: #define EHOSTUNREACH113 /* No route to host */ -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] mpirun hangs
On Fri, 24 Feb 2006, Emanuel Ziegler wrote: So "No rout to host" means that the TCP package could not be sent (usually host down, broken routing table, network interface down, ...). But it's 'ping'able and even rsh works fine. ... or some packet filtering is enabled. Check with 'iptables -L -n' run as root. Open MPI and port-based blocking are not compatible, search the archives of either this list or the LAM/MPI users list for discussions. BTW, /etc/hosts.allow says "ALL : ALL", so there should be no trouble. Do I have to modify /etc/securetty in order to allow orterun to access the machines or is the rsh/rlogin entry sufficient? If running commands on remote nodes with rsh is already working (as you showed in the first message), there shouldn't be any additional settings needed. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Open-MPI and TCP port range
On Thu, 20 Apr 2006, Jeff Squyres (jsquyres) wrote: > Right now, there is no way to restrict the port range that Open MPI > will use. ... If this becomes a problem for you (i.e., the random > MPI-chose-the-same-port-as-your-app events happen a lot), let us > know and we can probably put in some controls to work around this. I would welcome a discussion about this; on the LAM/MPI lists several people asked for a limited port range to allow them to pass through firewalls or to do tunelling. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de