Re: [OMPI users] Docker Cluster Queue Manager

2016-06-03 Thread John Hearns
Rob, have you looked at Singularity https://github.com/gmkurtzer/singularity/releases/tag/2.0 It is a new containerisation framework aimed squarely at HPC. Also you mention Juyputer. I am learning Julia at the moment, and I looked at the parallel facilities yesterday https://github.com/JuliaParal

Re: [OMPI users] Docker Cluster Queue Manager

2016-06-03 Thread John Hearns
Rob, I really think you should look at the FAQ http://singularity.lbl.gov/#faq Also I don;t understand what you mean by 'Out users don't have Unix user IDs' That is no problem of course - I have worked with Centrify and Samba, where you can define mappings between Windows users and Unix IDs or gro

Re: [OMPI users] Docker Cluster Queue Manager

2016-06-06 Thread John Hearns
Rob, I am not familair with wakari.io However what you say about the Unix userid problem is very relevant to many 'shared infrastructure' projects and is a topic which comes up in discussions about them. Teh concern there is, as you say, if the managers of the system have a global filesystem, with

[OMPI users] Affinity settings for hyperthreading

2016-07-18 Thread John Hearns
Please can someone point me towards the affinity settings for: OpenMPI 1.10 used with Slurm version 15 I have some nodes with 2630-v4 processors. So 10 cores per socket / 20 hyperthreads Hyperthreading is enabled. I would like to set affinity for 20 processes per node, so that the processes are

Re: [OMPI users] Affinity settings for hyperthreading

2016-07-18 Thread John Hearns
s to > *core*. Supported options include slot, hwthread, core, l1cache, l2cache, > l3cache, socket, numa, board, and none. > > https://www.open-mpi.org/doc/current/man1/mpirun.1.php#sect9 > > > > > On Jul 17, 2016, at 11:25 PM, John Hearns wrote: > > Please can someone

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread John Hearns
On 24 August 2010 18:58, Rahul Nabar wrote: > There are a few unusual things about the cluster. We are using a > 10GigE ethernet fabric. Each node has dual eth adapters. One 1GigE and > the other 10GigE. These are on seperate subnets although the order of > the eth interfaces is variable. i.e. 10G

Re: [OMPI users] Shared memory

2010-09-24 Thread John Hearns
On 24 September 2010 08:46, Andrei Fokau wrote: > We use a C-program which consumes a lot of memory per process (up to few > GB), 99% of the data being the same for each process. So for us it would be > quite reasonable to put that part of data in a shared memory. http://www.emsl.pnl.gov/docs/glo

Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster (fwd)

2010-11-20 Thread John Hearns
On 20 November 2010 16:31, Gilbert Grosdidier wrote: > Bonjour, Bonjour Gilbert. I manage ICE clusters also. Please could you have look at /etc/init.d/pbs on the compute blades? Do you have something like: if [ "${PBS_START_MOM}" -gt 0 ] ; then if check_prog "mom" ; then e

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread John Hearns
On 14 December 2010 17:32, Lydia Heck wrote: > > I have experimented a bit more and found that if I set > > OMPI_MCA_plm_rsh_num_concurrent=1024 > > a job with more than 2,500 processes will start and run. > > However when I searched the open-mpi web site for the the variable I could > not find an

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-17 Thread John Hearns
On 17 December 2010 14:45, Gilbert Grosdidier wrote: > Bonjour, >  About this issue, for which I got NO feedback ;-) Gilbert, as you have an SGI cluster, have you filed a support request to SGI? Also, which firmware do you have installed? I haveFirmware version: 2.5.0 http://www.open

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-17 Thread John Hearns
On 17 December 2010 14:45, Gilbert Grosdidier wrote: > Bonjour, >  About this issue, for which I got NO feedback ;-) I recently spotted > into btl_openib.c code, that this error message could come from On the cluster admin node, run firmware_revs and look for the Infiniband firmware

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-17 Thread John Hearns
On 17 December 2010 15:47, Gilbert Grosdidier wrote: >> > gg= I don't know, and firmware_revs does not seem to be available. > Only thing I got on a worker node was with lspci : If you log into a compute node the command is /usr/sbin/ibstat The firmware_revs command is on the cluster admin

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread John Hearns
On 6 January 2011 21:10, Gilbert Grosdidier wrote: > Hi Jeff, > >  Where's located lstopo command on SuseLinux, please ? > And/or hwloc-bind, which seems related to it ? I was able to get hwloc to install quite easily on SuSE - download/configure/make Configure it to install to /usr/local/bin A

Re: [OMPI users] Concerning infiniband support

2011-01-20 Thread John Hearns
On 20 January 2011 06:59, Zhigang Wei wrote: > Dear all, > > > > > > I want to use infiniband, I am from a University in the US, my University’s > high performance center don’t have Gcc compiled openmpi that support > infiniband, so I want to compile myself. That is a surprise - you must have som

Re: [OMPI users] Help with some fundamentals

2011-01-21 Thread John Hearns
On 20 January 2011 16:50, Olivier SANNIER wrote: > > > > So there is no dynamic discovery of nodes available on the network. Unless, > of course, if I was to write a tool that would do it before the actual run > is started. That is in essence what a batch scheduler does. OK, to be honest it has

Re: [OMPI users] Help with some fundamentals

2011-01-21 Thread John Hearns
On 20 January 2011 16:50, Olivier SANNIER wrote: >> I’ve started looking at beowulf clusters, and that lead me to PBS. Am I > right in assuming that PBS (PBSPro or TORQUE) could be used to do the > monitoring and the load balancing I thought of? Yes, that is correct. An alternative is Gridengine.

Re: [OMPI users] unable to run program

2011-04-02 Thread John Hearns
Mohd, the Clustermonkey site is a good resource for you http://www.clustermonkey.net/

Re: [OMPI users] WRF run on multiple Nodes

2011-04-04 Thread John Hearns
On 2 April 2011 04:16, Ahsan Ali wrote: > Hello, >  I want to run WRF on multiple nodes in a linux cluster using openmpi, > giving the command mpirun -np 4 ./wrf.exe just submit it to the single node > . I don't know how to run it on other nodes as well. Help needed. Ahsan, you have a Dell clu

Re: [OMPI users] How to add nodes while running job

2011-08-30 Thread John Hearns
On 30 August 2011 02:55, Ralph Castain wrote: > Instead, all used dynamic requests - i.e., the job that was doing a > comm_spawn would request resources at the time of the comm_spawn call. I > would pass the request to Torque, and if resources were available, > immediately process them into OMP

Re: [OMPI users] mpirun hangs when used on more than 2 CPUs

2012-01-17 Thread John Hearns
Andre, you should not need the OpenMPI sources. Install the openmpi-devel package from the same source (zypper install openmpi-devel if you have that science repository enabled) This will give you the mpi.h file and other include files, libraries and manual pages. That is a convention in Suse-sty

Re: [OMPI users] IO performance

2012-02-04 Thread John Hearns
On 03/02/2012, Tom Rosmond wrote: > Recently the organization I work for bought a modest sized Linux cluster > for running large atmospheric data assimilation systems. In my > experience a glaring problem with systems of this kind is poor IO > performance. Typically they have 2 types of network:

Re: [OMPI users] (no subject)

2012-03-17 Thread John Hearns
Harini, you can install OpenMPI which is packaged for your distribution of Linux, for examply on SuSE use zypper install openmpi or the equivalent on Redhat/Ubuntu You probably will not get the most up to date Openmpi version, but you will get the library paths set up in /etc/ld.so.conf.d/ and t

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread John Hearns
Have you checked the system logs on the machines where this is running? Is it perhaps that the processes use lots of memory and the Out Of Memory (OOM) killer is killing them? Also check all nodes for left-over 'orphan' processes which are still running after a job finishes - these should be killed

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread John Hearns
n the nodes. The failure > happend on Friday and after that tens of similar jobs completed > successfully. > > Regards, > Grzegorz Maj > > 2012/3/27 John Hearns : >> Have you checked the system logs on the machines where this is running? >> Is it perhaps that the proc

Re: [OMPI users] How to check processes working in parallel on one node of MPI cluster

2012-06-24 Thread John Hearns
It is well worth installing 'htop' to help diagnose situations like this.

Re: [OMPI users] UC Permission denied, please try again.

2012-08-02 Thread John Hearns
On 02/08/2012, Syed Ahsan Ali wrote: > Yes the issue has been diagnosed. I can ssh them but they are asking for > passwords You need to configure 'passwordless ssh' Can we assume that your home directory is shared across all cluster nodes? That means when you log into a cluster node the director

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-01 Thread John Hearns
Apologies, I have not taken the time to read your comprehensive diagnostics! As Gus says, this sounds like a memory problem. My suspicion would be the kernel Out Of Memory (OOM) killer. Log into those nodes (or ask your systems manager to do this). Look closely at /var/log/messages where there wil

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread John Hearns
You need to either copy the data to storage which the cluster nodes have mounted. Surely your cluster vendor included local storage? Or you can configure the cluster head node to export the SAN volume by NFS

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread John Hearns
Data is large and cannot be copied to the local drives od the compute nodes as the data is large. I understand that. I think that you have storage attached to your cluster head node - the 'SAN storage' you refer to. Lets' call that volume /data All you need to do is edit the /etc/exports file o

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread John Hearns
If I may ask, which comapny installed thsi cluster for you? Surely they will advise on how to NFS mount the storage on the compute nodes?

Re: [OMPI users] Tip for HPC cluster admins

2012-10-29 Thread John Hearns
Jeff, this is very good advice. I have had many, many hours of deep joy getting to know the OOM killer and all of his wily ways. Respect the OOM Killer! On cluster I manage, the OOM killer is working, however there is a strict policy that if OOM killer kicks on in a cluster node it is excluded f

Re: [OMPI users] Infiniband errors

2012-11-28 Thread John Hearns
Short answer. Run ibstats or ibstatus. Look also at the logs of your subnet manager.

Re: [OMPI users] Infiniband errors

2012-11-28 Thread John Hearns
Those diagnostics are from Openfabrics. What type of infiniband card do you have? What drivers are you using?

Re: [OMPI users] very low performance over infiniband

2013-01-27 Thread John Hearns
2 percent? Have you logged into a compute node and run a simple top when the job is running? Are all the processes distributed across the CPU cores? Are the processes being pinned properly to a core? Or are they hopping from core to core? Also make SURE all nodes havenooted with all cores online

Re: [OMPI users] very low performance over infiniband

2013-01-28 Thread John Hearns
Have you run ibstat on every single node and made sure all links are up at the correct speed? Have you checkef the output to make sure that you are not domehow running over ethernet?

Re: [OMPI users] control openmpi or force to use pbs?

2013-02-05 Thread John Hearns
Lart your users. Its the only way. they will thank you for it it, eventually. www.catb.org/jargon/html/L/LART.html

Re: [OMPI users] error while loading shared libraries: libhdf5.so.7:

2013-02-07 Thread John Hearns
ldd rca.x Try logging in to each node and run this command. Even better use pdsh

Re: [OMPI users] mpirun (Aborted) error

2013-02-24 Thread John Hearns
Backing up what Matthieu said, can you run a simple Hello world mpi application first? then something like a Pallas run - just to make sure you can run aplications in parallel.

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-18 Thread John Hearns
For information, if you use a batch system such as PbsPro or Torque it can be configured to set up the cpuset for a job and start the job within the cpuset. It will also destroy the cpuset at the end of a job. Highly useful for job cpu binding as you day and also if you have a machine running many

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-18 Thread John Hearns
On a bug system you can boot the system into a 'boot cpuset'. So all system processes run in a small number of low numbered cores. Plus any login sessions. The batch system then crwtes cpusets in the higher numbeted cores - free from OS interference.

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-18 Thread John Hearns
Bug system? Big system!

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-18 Thread John Hearns
You really should install a job scheduler. There are free versions. I'm not sure about cpuset support in Gridengine. Anyone?

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-21 Thread John Hearns
Agree with what you say Dave. Regarding not wanting jobs to use certsin cores ie. reserving low-numbered cores for OS processes then surely a good way forward is to use a 'boot cpuset' of one or two cores and let your jobs run on the rest of the cores. You're right about cpusets being helpful wit

Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core

2013-08-23 Thread John Hearns
On 23 August 2013 12:36, Dave Love wrote: > John Hearns writes: > > > > cpuset' of one or two cores and let your jobs run on the rest of the > cores. > > Maybe, if you make sure the resource manager knows about it, and users > don't mind losing the cor

Re: [OMPI users] Trouble with Suse Linux Enterprise Server 11 Installation

2013-08-30 Thread John Hearns
I agree what Ralph says. I have a lot of experience in running SLES 10 and 11 systems and many flavours of Opensuse. I am not sure if rpms for Openmpi are available for Sles - I will check. Installing Openfoam is a pig I agree. You could be better off with Opensuse from a point of view of Openm

Re: [OMPI users] Trouble with Suse Linux Enterprise Server 11 Installation

2013-08-30 Thread John Hearns
I jsut checked on the Opensuse Build Service. There are OpenMPI RPMs available for SLES 11 SP2 - but not SLES 11 SP1 http://software.opensuse.org/package/openmpi (then click on Show Other Versions ) I have got openmpi installed on my SLES 11 SP1 system, version 1.3.2 Zypper says it is rovide

Re: [OMPI users] Trouble with Suse Linux Enterprise Server 11 Installation

2013-08-30 Thread John Hearns
Also for info an Opensuse 12.2 system has openmpi 1.5.4 packaged with it.

Re: [OMPI users] Trouble with Suse Linux Enterprise Server 11 Installation

2013-08-30 Thread John Hearns
Serves me right for not reading your original mail you have SLES 11 SP2 The openmpi RPMs are provided by the SuSE Software Development Kit DVD. you can download SIO images of this from the SUSE website. For SP! it is named SLE-SP1-SDK-DVD-x86_64-GM-DVD1.iso So you should copy this .iso file

Re: [OMPI users] Changing directory from /tmp

2013-09-04 Thread John Hearns
Also should be able to define $TMPDIR with your batch system. This can be on a much bigger disk.

Re: [OMPI users] Query name of appfile

2013-09-17 Thread John Hearns
Not a good answer to your question but you could look for the child processes and look at /proc/$pid/cmdline and cwd Or just use pgrep -P $pidofmpirun This is not a good answer. I'm sitting at lunch - so an expert will be along in a minute with a good answer.

Re: [OMPI users] line 60: echo: write error: No space left on device

2013-10-01 Thread John Hearns
E do you have a filesystem which is full?df will tell you Or maybe mounted read only.

Re: [OMPI users] line 60: echo: write error: No space left on device

2013-10-01 Thread John Hearns
Good to hear that!

Re: [OMPI users] Get your Open MPI schwag!

2013-10-23 Thread John Hearns
OpenMPI aprons. Nice! Good to wear when cooking up those Chef recipes. (Did I really just say that...)

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-17 Thread John Hearns
'Htop' is a very good tool for looking at where processes are running.

Re: [OMPI users] Failing to MPI run on my linux cluster

2014-01-09 Thread John Hearns
I got NIST Fire and Smoke installed and running for a customer at my last job. The burning sofa demo is pretty nifty!

Re: [OMPI users] Running on two nodes slower than running on one node

2014-01-30 Thread John Hearns
Ps. 'htop' is a good tool for looking at where processes are running.

Re: [OMPI users] run a program

2014-02-26 Thread John Hearns
Khadije - you need to give a list of compute hosts to mpirun. And probably have to set up passwordless ssh to each host.

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-02-27 Thread John Hearns
Noam, cpusets are a very good idea. Not only for CPU binding but for isolating 'badky behaved' applications. If an application stsrts using huge amounts of memory - kill it, collapse the cpuset and it is gone - nice clean way to manage jobs.

Re: [OMPI users] using OpenMPI + SGE in a heterogeneous network

2008-06-06 Thread John Hearns
On Fri, 2008-06-06 at 17:56 +0100, SLIM H.A. wrote: > Hi > > I want to use SGE to run jobs on a cluster with mx and infiniband nodes. > By dividing the nodes into two host groups SGE will submit to either > interconnect. > > The interconnect can be specified in the mpirun command with the --mca >

Re: [OMPI users] Running Open MPI on Ethernet

2008-08-08 Thread John Hearns
loop several times and you will see what I mean. If you have other machines on the network, you have to configure them such that you can start remote processes on them. When you use "mpirun" to launch your MPI code you need to give the names of those machines as a parameter to mpirun - it is known as a "machines file". John Hearns

Re: [OMPI users] Linpack Benchmark and File Descriptor Limits

2008-09-18 Thread John Hearns
2008/9/18 Alex Wolfe > Hello, > > I am trying to run the HPL benchmarking software on a new 1024 core cluster > that we have set up. Unfortunately I'm hitting the "mca_oob_tcp_accept: > accept() failed: Too many open files (24)" error known in verson 1.2 of > openmpi. No matter what I set the fil

Re: [OMPI users] Linpack Benchmark and File Descriptor Limits

2008-09-19 Thread John Hearns
2008/9/19 Alex Wolfe > I'm just running it using mpirun from the command line. Thanks for the > reply. > >> > > Have you checked what ulimit -a returns on all the nodes on your cluster, ie when you ssh into them what does ulimit -a give you? I may be on the wrong track here.

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread John Hearns
Regarding hyperthreading, and finding our information about your CPUs in detail, there is the excellent hwloc project from OpenMPI http://www.open-mpi.org/projects/hwloc/ I downloaded the 1.0 release candidate, and it compiled and ran first time on Nehalem systems. Gives a superb and helpful view

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-06 Thread John Hearns
Gus, I'm not using OpenMPI, however OpenSUSE 11.2 with current updates seems to work fine on Nehalem. I'm curious that you say the Nvidia graphics driver does not install - have you tried running the install script manually, rather than downloading an RPM etc? I'm using version 195.36.15 and it

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-07 Thread John Hearns
On 7 May 2010 03:17, Jeff Squyres wrote: > > Indeed.  I have seen some people have HT enabled in the bios just so that > they can have the software option of turning them off via linux -- then you > can run with HT and without it and see what it does to your specific codes. I may have missed t

[OMPI users] Select a card in a multi card system

2015-04-15 Thread John Hearns
If you have a system with two IB cards, can you choose using a command line switch which card to use with Openmpi? Also a more general question - can you change (or throttle back) the speed at which an Infiniband card works at? For example, to use an fDR card at QDR speeds. Thanks for any insight

Re: [OMPI users] SGE segfaulting with OpenMPI 1.8.6

2015-07-23 Thread John Hearns
You say that you can run the code OK 'by hand' with an mpirun. Are you assuming somehow that the Gridengine jobs will inherit your environment variables, paths etc? If I remember correctly, you should submit wiht the -V option to pass over environment settings. Even better, make sure that the jo

Re: [OMPI users] NUMA: Non-local memory access and performance effects on OpenMPI

2015-07-27 Thread John Hearns
As an aside, with Slurm you can use: sbatch --ntasks-per-socket= I would hazard a guess that this uses the OpenMPI syntax as above to perform the binding to core! On 27 July 2015 at 09:47, Ralph Castain wrote: > As you say, it all depends on your kernel :-) > > If the numactl libraries are a

Re: [OMPI users] Raspberry Pi 2 Beowulf Cluster for OpenFOAM

2016-01-24 Thread John Hearns
Hi Steve. Regarding Step 3, have you thought of using some shared storage? NFS shared drive perhaps, or there are many alternatives! On 23 January 2016 at 20:47, Steve O'Hara wrote: > Hi, > > > > I’m afraid I’m pretty new to both OpenFOAM and openMPI so please excuse me > if my questions are eit

Re: [OMPI users] Ubuntu and LD_LIBRARY_PATH

2016-04-26 Thread John Hearns
Rob, I agree with what Dave Love says. The distro packaged OpenMPI packages should set things up OK for you. I guess that is true on the head node, but from what you say maybe the cluster compute nodes are being installed some other way. On HPC clusters, when you are managing alternate packages

Re: [OMPI users] /dev/shm

2008-11-19 Thread John Hearns
2008/11/19 Ray Muno > Thought I would revisit this one. > > We are still having issues with this. It is not clear to me what is leaving > the user files behind in /dev/shm. > > This is not something users are doing directly, they are just compiling > their code directly with mpif90 (from OpenMPI)

Re: [OMPI users] /dev/shm

2008-11-20 Thread John Hearns
2008/11/20 Ray Muno > J >> >> >> >> > OK, what should I be seeing when I run "ipcs -p"? > > Looks like I don't know my System V from my POSIX. I know what to do.

Re: [OMPI users] signal 15 (terminated)

2009-02-04 Thread John Hearns
2009/2/4 Hana Milani > > Is there a local system administrator that you can talk to about this? > > Not a very good one, I'm sure he or she is just gonna LOVE you. I would seriously advise a big box of doughnuts on http://www.sysadminday.com/ And please cut the HTML formatting with the bold tex

Re: [OMPI users] Linux opteron infiniband sunstudio configure problem

2009-03-30 Thread John Hearns
2009/3/30 Kevin McManus : > > > I can find psm libs at... > > /usr/lib/libpsm_infinipath.so.1.0 > /usr/lib/libpsm_infinipath.so.1 > /usr/lib64/libpsm_infinipath.so.1.0 > /usr/lib64/libpsm_infinipath.so.1 On x86_64 type systems /usr/lib64 are the 64 bit libraries, /usr/lib are the 32 bit ones

Re: [OMPI users] libnuma issue

2009-04-03 Thread John Hearns
2009/4/3 Francesco Pietra : > > "expected file /usr/lib/include/numa.h was not found" > > In debian amd64 lenny numa.h has a different location > "/usr/include/numa.h". Attached is the config.log. > > I would appreciate help in circumventing the problem. It is /usr/include/numa.h on SuSE also (SLE

Re: [OMPI users] libnuma issue

2009-04-03 Thread John Hearns
2009/4/3 Francesco Pietra : > I was not sure whether that is a technically correct procedure. It works. > Thanks > It most certainly is not. But I have been a Unix system admin for many years. I have done things which I am not proud of If I ever offer to let you use my keyboard, wash your

Re: [OMPI users] Problem with running openMPI program

2009-04-06 Thread John Hearns
2009/4/6 Ankush Kaul : >> Also how do i come to know that the program is using resources of both the > nodes? Log into the second node before you start the program. Run 'top' Seriously - top is a very, very useful utility.

Re: [OMPI users] could oversubscription clobber an executable?

2009-05-14 Thread John Hearns
2009/5/14 Valmor de Almeida : > > Hello, > > I am wondering whether light oversubscription could lead to a clobbered > program. Apologies if this is a stupid reply. Have you checked if the OOM killer (out of memory killer) is being triggered when you run the program on the laptop? Open a separate w

Re: [OMPI users] New to (Open)MPI

2016-09-01 Thread John Hearns via users
Hello Lachlan. I think Jeff Squyres will be along in a short while! HE is of course the expert on Cisco. In the meantime a quick Google turns up: http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/usnic/c/deployment/2_0_X/b_Cisco_usNIC_Deployment_Guide_For_Standalone_C-SeriesServers.html

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread John Hearns via users
Mahmood, as Giles says start by looking at how that application is compiled and linked. Run 'ldd' on the executable and look closely at the libraries. Do this on a compute node if you can. There was a discussion on another mailign list recently about how to fingerpritn executables and see which a

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread John Hearns via users
Mahmood, are you compiling and linking this application? Or are you using an executable which someone else has prepared? It would be very useful if we could know the application. On 2 September 2016 at 16:35, Mahmood Naderan wrote: > >Did you ran > >ulimit -c unlimited > >before invoking mpi

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread John Hearns via users
Thankyou. That is helpful. Could you run an 'ldd' on your executable, on one of the compute nodes if possible? I will nto be able to solve your problem, but at least we now know what the application is, and can look at the libraries it is using. On 2 September 2016 at 17:19, Mahmood Naderan w

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sergei, what does the command "ibv_devinfo" return please? I had a recent case like this, but on Qlogic hardware. Sorry if I am mixing things up. On 28 October 2016 at 10:48, Sergei Hrushev wrote: > Hello, All ! > > We have a problem with OpenMPI version 1.10.2 on a cluster with newly > inst

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sorry - shoot down my idea. Over to someone else (me hides head in shame) On 28 October 2016 at 11:28, Sergei Hrushev wrote: > Sergei, what does the command "ibv_devinfo" return please? >> >> I had a recent case like this, but on Qlogic hardware. >> Sorry if I am mixing things up. >> >> > An

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread John Hearns via users
Segei, can you run : ibhosts ibstat ibdiagnet Lord help me for being so naive, but do you have a subnet manager running? On 1 November 2016 at 06:40, Sergei Hrushev wrote: > Hi Jeff ! > > What does "ompi_info | grep openib" show? >> >> > $ ompi_info | grep openib > MCA bt

Re: [OMPI users] install OpenMPI on CentOS in HPC

2016-12-18 Thread John Hearns via users
Mahmoud, you should look at the OpenHPC project. http://www.openhpc.community/ On 15 December 2016 at 19:50, Mahmoud MIRZAEI wrote: > Dears, > > May you please let me know if there is any procedure to install OpenMPI on > CentOS in HPC? > > Thanks. > Mahmoud > > > > _

Re: [OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?

2017-03-24 Thread John Hearns via users
Jordi, this is not an answer to your question. However have you looked at Singularity: http://singularity.lbl.gov/ On 24 March 2017 at 08:54, Jordi Guitart wrote: > Hello, > > Docker allows several containers running in the same host to share the > same IPC namespace, thus they can share mem

Re: [OMPI users] Basic build trouble on RHEL7

2017-04-27 Thread John Hearns via users
Ray, probably a stupid question but do you have the hwloc-devel package installed? And also the libxml2-devel package? On 27 April 2017 at 21:54, Ray Sheppard wrote: > Hi All, > I have searched the mail archives because I think this issue was > addressed earlier, but I can not find anything

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-18 Thread John Hearns via users
Gabriele, as this is based on OpenMPI can you run ompi_info then look for the btl which are available and the mtl which are available? On 18 May 2017 at 14:10, Reuti wrote: > Hi, > > > Am 18.05.2017 um 14:02 schrieb Gabriele Fatigati : > > > > Dear OpenMPI users and developers, I'm using IBM

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-18 Thread John Hearns via users
CA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v10.1.0) > > > about mtl no information retrieve ompi_info > > > 2017-05-18 14:13 GMT+02:00 John Hearns via users >: > >> Gabriele, as this is based on OpenMPI can you run ompi_info >> then look for the btl which a

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-18 Thread John Hearns via users
Gabriele, pleae run 'ibv_devinfo' It looks to me like you may have the physical interface cards in these systems, but you do not have the correct drivers or libraries loaded. I have had similar messages when using Infiniband on x86 systems - which did not have libibverbs installed. On 19 May

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
it does not work, can run and post the logs) > > mpirun --mca pml ^pami --mca pml_base_verbose 100 ... > > > Cheers, > > > Gilles > > > On 5/19/2017 4:01 PM, Gabriele Fatigati wrote: > >> Hi John, >> Infiniband is not used, there is a single node on

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Giles, Allan, if the host 'smd' is acting as a cluster head node it is not a must for it to have an Infiniband card. So you should be able to run jobs across the other nodes, which have Qlogic cards. I may have something mixed up here, if so I am sorry. If you want also to run jobs on the smd hos

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Allan, remember that Infiniband is not Ethernet. You dont NEED to set up IPOIB interfaces. Two diagnostics please for you to run: ibnetdiscover ibdiagnet Let us please have the reuslts ofibnetdiscover On 19 May 2017 at 09:25, John Hearns wrote: > Giles, Allan, > > if the

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
folks will comment on that shortly. >>> >>> >>> meanwhile, you do not need pami since you are running on a single node >>> >>> mpirun --mca pml ^pami ... >>> >>> should do the trick >>> >>> (if it does not w

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
;>> findActiveDevices Error >>> We found no active IB device ports >>> Hello world from rank 0 out of 1 processors >>> >>> So it seems to work apart the error message. >>> >>> >>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillard

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
neral case. supercomputer cluster running over high performance fabrics are complicated beasts. Itis not sufficient to plug in cards and cable. On 19 May 2017 at 11:12, John Hearns wrote: > I am not sure I agree with that. > (a) the original error message from Gabriele was quite

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread John Hearns via users
Michael, try --mca plm_rsh_agent ssh I've been fooling with this myself recently, in the contect of a PBS cluster On 22 June 2017 at 16:16, Michael Di Domenico wrote: > is it possible to disable slurm/munge/psm/pmi(x) from the mpirun > command line or (better) using environment variables? > >

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread John Hearns via users
; You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment > > > On Jun 22, 2017, at 7:28 AM, John Hearns via users < > users@lists.open-mpi.org> wrote: > > Michael, try > --mca plm_rsh_agent ssh > > I've been fooling with this myself recentl

[OMPI users] Openmpi with btl_openib_ib_service_level

2017-06-22 Thread John Hearns via users
I may have asked this recently (if so sorry). If anyoen has worked with QoS settings with OpenMPI please ping me off list, eg mpirun --mca btl_openib_ib_service_level N ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/

  1   2   >