Re: [OMPI users] busy waiting and oversubscriptions

2014-03-27 Thread Lloyd Brown
I don't know about your users, but experience has, unfortunately, taught us to assume that users' jobs are very, very badly-behaved. I choose to assume that it's incompetence on the part of programmers and users, rather than malice, though. :-) Lloyd Brown Systems Admin

[OMPI users] understanding BTL selection process

2015-03-02 Thread Lloyd Brown
nMPI on the non-IB node, without OFED installed, to no longer be able to figure out that it shouldn't use the openib btl. Thus the reason why I ask for more information about how that decision is being made. Maybe that will clue me in, as to what changed. Thanks, -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu

Re: [OMPI users] understanding BTL selection process

2015-03-02 Thread Lloyd Brown
of this implies that the difference is related to something that happened with librdmacm, not something that changed in OpenMPI. Sorry for the list noise. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 03/02/2015 02:42 PM, Lloyd Brown wrote

[OMPI users] segfault when resuming on different host

2011-12-29 Thread Lloyd Brown
, but I'm just stumped where to go from here. I have some core files, but I'm having trouble getting the symbols from the backtrace in gdb. Maybe I'm doing it wrong. TIA, -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu byufsl_debugging_segfault_on_resume.tar.gz Description: application/gzip

Re: [OMPI users] segfault when resuming on different host

2011-12-29 Thread Lloyd Brown
right direction. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 12/29/2011 02:31 PM, Josh Hursey wrote: > Often this type of problem is due to the 'prelink' option in Linux. > BLCR has a FAQ item that discusses this issue a

Re: [OMPI users] Checkpoint an MPI process

2012-01-19 Thread Lloyd Brown
how to do so, etc.). But if you're writing the application, you're better off to handle it internally, than externally. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 01/19/2012 08:05 AM, Josh Hursey wrote: > Currently

Re: [OMPI users] Mpirun: How to print STDOUT of just one process?

2012-02-01 Thread Lloyd Brown
27;ve used this technique to play with ulimit sort of things in the script before. I'm not entirely sure what variables are exposed to you in the script, such that you could come up with a unique filename to output to, though. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Y

Re: [OMPI users] ssh between nodes

2012-02-29 Thread Lloyd Brown
reporting resources utilized, etc. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 02/29/2012 02:09 PM, Denver Smith wrote: > Hello, > > On my cluster running moab and torque, I cannot ssh without a password > between comput

Re: [OMPI users] regarding the problem occurred while running anmpi programs

2012-04-25 Thread Lloyd Brown
1.4.5/lib. You really need that to be in LD_LIBRARY_PATH (or some other method) on all nodes, in all shells for the user. One simple way to do this is via the startup files (eg. .bashrc and .bash_profile for bash, .cshrc for csh/tcsh, etc.) Lloyd Brown Systems Administrator Fulton Supercomputing La

[OMPI users] rpmbuild defining opt install path

2012-06-26 Thread Lloyd Brown
o end up with at least 3 versions of v1.6 (gcc compilers, intel compilers, pgi compilers) and possibly a few of a previous version, so putting everything in /opt/openmpi/VERSION, is a little problematic. Thanks, -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu

Re: [OMPI users] rpmbuild defining opt install path

2012-06-26 Thread Lloyd Brown
build -bb path/to/openmpi-1.6.spec In this case, the "" are all exactly the same. Clearly there's something I'm missing about the RPM build process. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 06/26/2012 12:

Re: [OMPI users] rpmbuild defining opt install path

2012-06-27 Thread Lloyd Brown
as to where it's installed. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 06/27/2012 11:12 AM, Jeff Squyres wrote: > On Jun 26, 2012, at 2:40 PM, Lloyd Brown wrote: > >> Is there an easy way with the .spec file and the

Re: [OMPI users] Measuring latency

2012-08-21 Thread Lloyd Brown
I'm not really familiar enough to know what you mean by "em slaves", but for general testing of bandwidth and latency, I usually use the "OSU Micro-benchmarks" (see http://mvapich.cse.ohio-state.edu/benchmarks/). Lloyd Brown Systems Administrator Fulton Supercomputing Lab

Re: [OMPI users] Measuring latency

2012-08-21 Thread Lloyd Brown
That's fine. In that case, you just compile it with your MPI implementation and do something like this: mpiexec -np 2 -H masterhostname,slavehostname ./osu_latency There may be some all-to-all latency tools too. I don't really remember. Lloyd Brown Systems Administrator Fulton Supe

[OMPI users] PG compilers and OpenMPI 1.6.1

2012-08-23 Thread Lloyd Brown
t_lock(&_M_lock); > ^ > > "/opt/pgi/linux86-64/12.5/include/CC/stl/_threads.h", line 375: error: > identifier "omp_set_lock" is undefined > omp_set_lock(&_M_lock); > ^ > > "/opt/pgi/linux86-64/12.5/include/CC/stl/_threads.h&q

Re: [OMPI users] PG compilers and OpenMPI 1.6.1

2012-08-23 Thread Lloyd Brown
ch compile just fine with 1.6.1. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 08/23/2012 04:43 PM, Jeff Squyres wrote: > This was reported earlier today: > > https://svn.open-mpi.org/trac/ompi/ticket/3251 > >

Re: [OMPI users] PG compilers and OpenMPI 1.6.1

2012-08-27 Thread Lloyd Brown
Thanks for getting this in so quickly. Yes, the nightly tarball from Aug 25 (a1r27142), seems to get through a configure and make stage at least. Thanks, Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 08/25/2012 05:18 AM, Jeff

Re: [OMPI users] PBS jobs with OPENMPI

2012-11-19 Thread Lloyd Brown
supply the number of nodes and nodefile, like this: NP=`wc -l $PBS_NODEFILE | awk '{print $1}'` mpirun -n $NP -hostfile $PBS_NODEFILE myprogram Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 11/19/2012 03:28 PM, Mariana Var

Re: [OMPI users] check point restart

2013-07-19 Thread Lloyd Brown
and internal to your application, choose the application-internal checkpointing. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 07/19/2013 01:34 PM, Erik Nelson wrote: > I run mpi on an NSF computer. One of the conditions of use is

[OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Lloyd Brown
general pointers on mpirun debugging flags to use. I can't find much in the docs yet on run-time debugging for OpenMPI, as opposed to debugging the application. Maybe I'm just looking in the wrong place. Thanks, -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Lloyd Brown
, does seem to work. I admit I'm still curious to understand how to get OpenMPI to give me the details of what's going on. But the immediate problem of getting the numbers out of osu_bw and osu_latency, seems to be solved. Thanks everyone. I really appreciate it. -- Lloyd Brown S

Re: [OMPI users] Debugging Runtime/Ethernet Problems

2013-09-20 Thread Lloyd Brown
't have to exclude anything, and it figures out to use em1, and not lo. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 09/20/2013 10:31 AM, Jeff Squyres (jsquyres) wrote: > On Sep 20, 2013, at 12:27 PM, Lloyd Brown wrote:

Re: [OMPI users] Oversubscription of nodes with Torque and OpenMPI

2013-11-22 Thread Lloyd Brown
of physical processors on the hosts. Whether this works for you, depends on whether you want this type of oversubscription to happen all the time, or on a per-job basis, etc. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 11/22/2013 11:

Re: [OMPI users] Setting bind-to none as default via environment?

2015-11-02 Thread Lloyd Brown
into a situation where users have a combination of OpenMPI and OpenMP threads, and the threads get constrained to the same processor where the OpenMPI process was launched. As far as we can tell, this started with v1.8.x. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young Un

Re: [OMPI users] Setting bind-to none as default via environment?

2015-11-03 Thread Lloyd Brown
n between processors (but within the cgroup) than we would like, but that's still probably acceptable in this scenario. If there's a better solution, we'd love to hear it. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On