date:20100924

Re: [OMPI users] "self scheduled" work & mpi receive???

2010-09-24 Thread Lewis, Ambrose J.

Good points...I'll see if anything can be done to speed up the master. If we can shrink the number of MPI processes without hurting overall throughput maybe I could save enough to fit another run on the freed cores. Thanks for the ideas! I was also worried about contention on the nodes since I

Re: [OMPI users] Shared memory

2010-09-24 Thread Durga Choudhury

I think the 'middle ground' approach can be simplified even further if the data file is in a shared device (e.g. NFS/Samba mount) that can be mounted at the same location of the file system tree on all nodes. I have never tried it, though and mmap()'ing a non-POSIX compliant file system such as Sam

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje

Eloi Gaudry wrote: Terry, You were right, the error indeed seems to come from the message coalescing feature. If I turn it off using the "--mca btl_openib_use_message_coalescing 0", I'm not able to observe the "hdr->tag=0" error. There are some trac requests associated to very similar error (

Re: [OMPI users] Shared memory

2010-09-24 Thread Eugene Loh

It seems to me there are two extremes. One is that you replicate the data for each process. This has the disadvantage of consuming lots of memory "unnecessarily." Another extreme is that shared data is distributed over all processes. This has the disadvantage of making at least some of the

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Eloi Gaudry

Terry, You were right, the error indeed seems to come from the message coalescing feature. If I turn it off using the "--mca btl_openib_use_message_coalescing 0", I'm not able to observe the "hdr->tag=0" error. There are some trac requests associated to very similar error (https://svn.open-mpi

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Eloi Gaudry

Terry, No, I haven't tried any other values than P,65536,256,192,128 yet. The reason why is quite simple. I've been reading and reading again this thread to understand the btl_openib_receive_queues meaning and I can't figure out why the default values seem to induce the "hdr->tag=0" issue (ht

Re: [OMPI users] "self scheduled" work & mpi receive???

2010-09-24 Thread Richard Treumann

Amb It sounds like you have more workers than you can keep fed. Workers are finishing up and requesting their next assignment but sit idle because there are so many other idle workers too. Load balance does not really matter if the choke point is the master. The work is being done as fast as

Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Olivier Riff

That is already an answer that make sense. I understand that it is really not a trivial issue. I have seen other recent threads about "running on crashed nodes", and that the openmpi team is working hard on it. Well we will wait and be glad to test the first versions when (I understand it will take

Re: [OMPI users] Shared memory

2010-09-24 Thread Andrei Fokau

The data are read from a file and processed before calculations begin, so I think that mapping will not work in our case. Global Arrays look promising indeed. As I said, we need to put just a part of data to the shared section. John, do you (or may be other users) have an experience of working wit

Re: [OMPI users] How to know which process is running on which core?

2010-09-24 Thread Jeff Squyres

I completely neglected to mention that you could also use hwloc (Hardware Locality), a small utility library for learning topology-kinds of things (including if you're bound, where you're bound, etc.). Hwloc is a sub-project of Open MPI: http://www.open-mpi.org/projects/hwloc/ Open MPI us

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje

That is interesting. So does the number of processes affect your runs any. The times I've seen hdr->tag be 0 usually has been due to protocol issues. The tag should never be 0. Have you tried to do other receive_queue settings other than the default and the one you mention. I wonder if you

Re: [OMPI users] Running on crashing nodes

2010-09-24 Thread Joshua Hursey

As one of the Open MPI developers actively working on the MPI layer stabilization/recover feature set, I don't think we can give you a specific timeframe for availability, especially availability in a stable release. Once the initial functionality is finished, we will open it up for user testing

Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Jeff Squyres

Open MPI's fault tolerance is still somewhat rudimentary; it's a complex topic within the entire scope of MPI. There has been much research into MPI and fault tolerance over the years; the MPI Forum itself is grappling with terms and definitions that make sense. It's by no means a "solved" pro

Re: [OMPI users] How to know which process is running on which core?

2010-09-24 Thread Jeff Squyres

On the OMPI SVN trunk, we have an "Open MPI extension" call named OMPI_Affinity_str(). Below is an excerpt from the man page. If this is desirable, we can probably get it into 1.5.1. - NAME OMPI_Affinity_str - Obtain prettyprint strings of processor affinity information f

Re: [OMPI users] Shared memory

2010-09-24 Thread Reuti

Am 24.09.2010 um 13:26 schrieb John Hearns: > On 24 September 2010 08:46, Andrei Fokau wrote: >> We use a C-program which consumes a lot of memory per process (up to few >> GB), 99% of the data being the same for each process. So for us it would be >> quite reasonable to put that part of data in

Re: [OMPI users] Shared memory

2010-09-24 Thread John Hearns

On 24 September 2010 08:46, Andrei Fokau wrote: > We use a C-program which consumes a lot of memory per process (up to few > GB), 99% of the data being the same for each process. So for us it would be > quite reasonable to put that part of data in a shared memory. http://www.emsl.pnl.gov/docs/glo

Re: [OMPI users] Shared memory

2010-09-24 Thread Durga Choudhury

Is the data coming from a read-only file? In that case, a better way might be to memory map that file in the root process and share the map pointer in all the slave threads. This, like shared memory, will work only for processes within a node, of course. On Fri, Sep 24, 2010 at 3:46 AM, Andrei Fo

[OMPI users] Shared memory

2010-09-24 Thread Andrei Fokau

We use a C-program which consumes a lot of memory per process (up to few GB), 99% of the data being the same for each process. So for us it would be quite reasonable to put that part of data in a shared memory. In the source code, the memory is allocated via malloc() function. What would it requir

Re: [OMPI users] Running on crashing nodes

2010-09-24 Thread Andrei Fokau

Ralph, could you tell us when this functionality will be available in the stable version? A rough estimate will be fine. On Fri, Sep 24, 2010 at 01:24, Ralph Castain wrote: > In a word, no. If a node crashes, OMPI will abort the currently-running job > if it had processes on that node. There is

[OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Olivier Riff

Hello, My question concerns the display of error message generated by a throw std::runtime_error("Explicit error message"). I am launching on a terminal an openMPI program on several machines using: mpirun -v -machinefile MyMachineFile.txt MyProgram. I am wondering why I cannot see an error messag

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Eloi Gaudry

Hi Terry, The messages being send/received can be of any size, but the error seems to happen more often with small messages (as an int being broadcasted or allreduced). The failing communication differs from one run to another, but some spots are more likely to be failing than another. And as f

Re: [OMPI users] "self scheduled" work & mpi receive???

Re: [OMPI users] Shared memory

Re: [OMPI users] [openib] segfault when using openib btl

Re: [OMPI users] Shared memory

Re: [OMPI users] [openib] segfault when using openib btl

Re: [OMPI users] [openib] segfault when using openib btl

Re: [OMPI users] "self scheduled" work & mpi receive???

Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

Re: [OMPI users] Shared memory

Re: [OMPI users] How to know which process is running on which core?

Re: [OMPI users] [openib] segfault when using openib btl

Re: [OMPI users] Running on crashing nodes

Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

Re: [OMPI users] How to know which process is running on which core?

Re: [OMPI users] Shared memory

Re: [OMPI users] Shared memory

Re: [OMPI users] Shared memory

[OMPI users] Shared memory

Re: [OMPI users] Running on crashing nodes

[OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

Re: [OMPI users] [openib] segfault when using openib btl

21 matches

Site Navigation

Mail list logo

Footer information