Re: [OMPI users] shared memory (sm) module not working properly?

2010-01-15 Thread Eugene Loh
Dunno.  Do lower np values succeed?  If so, at what value of np does the job no longer start? Perhaps it's having a hard time creating the shared-memory backing file in /tmp.  I think this is a 64-Mbyte file.  If this is the case, try reducing the size of the shared area per this FAQ item:  ht

Re: [OMPI users] shared memory (sm) module not working properly?

2010-01-15 Thread Nicolas Bock
Sorry, I forgot to give more details on what versions I am using: OpenMPI 1.4 Ubuntu 9.10, kernel 2.6.31-16-generic #53-Ubuntu gcc (Ubuntu 4.4.1-4ubuntu8) 4.4.1 On Fri, Jan 15, 2010 at 15:47, Nicolas Bock wrote: > Hello list, > > I am running a job on a 4 quadcore AMD Opteron. This machine ha

[OMPI users] shared memory (sm) module not working properly?

2010-01-15 Thread Nicolas Bock
Hello list, I am running a job on a 4 quadcore AMD Opteron. This machine has 16 cores, which I can verify by looking at /proc/cpuinfo. However, when I run a job with mpirun -np 16 -mca btl self,sm job I get this error: -- A

Re: [OMPI users] dynamic rules

2010-01-15 Thread Daniel Spångberg
I tried this and it still crashes with openmpi-1.4. Is it supposed to work with openmpi-1.4 or do I need to compile openmpi-1.4.1 ? Terribly sorry, I should checked my own notes thoroughly before giving others advice. One needs to give the dynamic rules file location on the command line:

Re: [OMPI users] Checkpoint/Restart error

2010-01-15 Thread Andreea Costea
It's almost midnight here, so I left home, but I will try it tomorrow. There were some directories left after "make uninstall". I will give more details tomorrow. Thanks Jeff, Andreea On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres wrote: > On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote: > >

Re: [OMPI users] Checkpoint/Restart error

2010-01-15 Thread Jeff Squyres
On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote: > - I wanted to update to version 1.4.1 and I uninstalled previous version like > this: make uninstall, and than manually deleted all the left over files. the > directory where I installed was /usr/local I'll let Josh answer your CR questions,

Re: [OMPI users] dynamic rules

2010-01-15 Thread Roman Martonak
>I have done this according to suggestion on this list, until a fix comes >that makes it possible to change via command line: > >To choose bruck for all message sizes / mpi sizes with openmpi-1.4 > >File $HOME/.openmpi/mca-params.conf (replace /homeX) so it points to >the correct file: >coll_tu

Re: [OMPI users] Checkpoint/Restart error

2010-01-15 Thread Andreea Costea
I don't know what else should I try... because it worked on 1.3.3 doing exactly the same steps. I tried to install it both with an active eth interface and an inactive one. I am running on a virtual machine that has CentOS as OS. Any suggestions? Thanks, Andreea On Fri, Jan 15, 2010 at 9:07 PM,

Re: [OMPI users] dynamic rules

2010-01-15 Thread Daniel Spångberg
I have done this according to suggestion on this list, until a fix comes that makes it possible to change via command line: To choose bruck for all message sizes / mpi sizes with openmpi-1.4 File $HOME/.openmpi/mca-params.conf (replace /homeX) so it points to the correct file: coll_tune

Re: [OMPI users] Rapid I/O support

2010-01-15 Thread Scott Atchley
On Jan 14, 2010, at 3:08 PM, Jeff Squyres wrote: On Jan 14, 2010, at 1:59 PM, TONY BASIL wrote: I am doing a project with an HPC set up on multicore Power PC..Nodes will be connected using Rapid I/O instead for Gigabit Ethernet...I would like to know if OpenMPI supports Rapid I/O... I'm

Re: [OMPI users] More NetBSD fixes

2010-01-15 Thread Jed Brown
On Thu, 14 Jan 2010 21:55:06 -0500, Jeff Squyres wrote: > That being said, you could sign up on it and then set your membership to > receive no mail...? This is especially dangerous because the Open MPI lists munge the Reply-To header, which is a bad thing http://www.unicom.com/pw/reply-to-ha

Re: [OMPI users] Checkpoint/Restart error

2010-01-15 Thread Andreea Costea
I tried the new version, that was uploaded today. I still have that error, just that now is at line 405 instead of 399. Maybe if I give more details: - I first had OpenMPI version 1.3.3 with BLCR installed: mpirun, ompi-checkpoint and ompi-restart worked with that version. - I wanted to update to

[OMPI users] dynamic rules

2010-01-15 Thread Roman Martonak
On my machine I need to use dynamic rules to enforce the bruck or pairwise algorithm for alltoall, since unfortunately the default basic linear algorithm performs quite poorly on my Infiniband network. Few months ago I noticed that in case of VASP, however, the use of dynamic rules via --mca coll_t

Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-15 Thread Shiqing Fan
Hi Charlie, Glad to hear that you compiled it successfully. The error you got with 1.3.4 is a bug that the CMake script didn't set the SVN information correctly, and it has been fixed in 1.4 and later. Thanks, Shiqing cjohn...@valverdecomputing.com wrote: Yes that was it. A much improve

Re: [OMPI users] MPI debugger

2010-01-15 Thread Ashley Pittman
On 11 Jan 2010, at 06:20, Jed Brown wrote: > On Sun, 10 Jan 2010 19:29:18 +, Ashley Pittman > wrote: >> It'll show you parallel stack traces but won't let you single step for >> example. > > Two lightweight options if you want stepping, breakpoints, watchpoints, > etc. > > * Use serial de

[OMPI users] Open MPI v1.4.1 released

2010-01-15 Thread Ralph Castain
The Open MPI Team, representing a consortium of research, academic, and industry partners, is pleased to announce the release of Open MPI version 1.4.1. This release is strictly a bug fix release over the v1.4 release. Version 1.4.1 can be downloaded from the main Open MPI web site or any of its m

Re: [OMPI users] Windows CMake build problems ... (cont.)

2010-01-15 Thread cjohnson
Yes that was it.A much improved result now from CMake 2.6.4, no errors from compiling openmpi-1.4:1>libopen-pal - 0 error(s), 9 warning(s)2>libopen-rte - 0 error(s), 7 warning(s)3>opal-restart - 0 error(s), 0 warning(s)4>opal-wrapper - 0 error(s), 0 warning(s)5>libmpi - 0 error(s), 42 warning(s)6>o

Re: [OMPI users] Checkpoint/Restart error

2010-01-15 Thread Andreea Costea
Hi... still not working. Though I uninstalled OpenMPI with make uninstall and I manually deleted all other files, I still have the same error when checkpointing. Any idea? Thanks, Andreea On Thu, Jan 14, 2010 at 10:38 PM, Joshua Hursey wrote: > On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote