Thanks, that explains it :) On Tue, Jan 19, 2010 at 15:01, Ralph Castain <r...@open-mpi.org> wrote:
> Shared memory doesn't extend between comm_spawn'd parent/child processes in > Open MPI. Perhaps someday it will, but not yet. > > > On Jan 19, 2010, at 1:14 PM, Nicolas Bock wrote: > > Hello list, > > I think I understand better now what's happening, although I still don't > know why. I have attached two small C codes that demonstrate the problem. > The code in main.c uses MPI_Comm_spawn() to start the code in the second > source, child.c. I can force the issue by running the main.c code with > > mpirun -mca btl self,sm -np 1 ./main > > and get this error: > > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[26121,2],0]) is on host: mujo > Process 2 ([[26121,1],0]) is on host: mujo > BTLs attempted: self sm > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > > Is that because the spawned process is in a different group? They are still > all running on the same host, so at least in principle they should be able > to communicate with each other via shared memory. > > nick > > > > On Fri, Jan 15, 2010 at 16:08, Eugene Loh <eugene....@sun.com> wrote: > >> Dunno. Do lower np values succeed? If so, at what value of np does the >> job no longer start? >> >> Perhaps it's having a hard time creating the shared-memory backing file in >> /tmp. I think this is a 64-Mbyte file. If this is the case, try reducing >> the size of the shared area per this FAQ item: >> http://www.open-mpi.org/faq/?category=sm#decrease-sm Most notably, >> reduce mpool_sm_min_size below 67108864. >> >> Also note trac ticket 2043, which describes problems with the sm BTL >> exposed by GCC 4.4.x compilers. You need to get a sufficiently recent build >> to solve this. But, those problems don't occur until you start passing >> messages, and here you're not even starting up. >> >> >> Nicolas Bock wrote: >> >> Sorry, I forgot to give more details on what versions I am using: >> >> OpenMPI 1.4 >> Ubuntu 9.10, kernel 2.6.31-16-generic #53-Ubuntu >> gcc (Ubuntu 4.4.1-4ubuntu8) 4.4.1 >> >> On Fri, Jan 15, 2010 at 15:47, Nicolas Bock <nicolasb...@gmail.com>wrote: >> >>> Hello list, >>> >>> I am running a job on a 4 quadcore AMD Opteron. This machine has 16 >>> cores, which I can verify by looking at /proc/cpuinfo. However, when I run a >>> job with >>> >>> mpirun -np 16 -mca btl self,sm job >>> >>> I get this error: >>> >>> >>> -------------------------------------------------------------------------- >>> At least one pair of MPI processes are unable to reach each other for >>> MPI communications. This means that no Open MPI device has indicated >>> that it can be used to communicate between these processes. This is >>> an error; Open MPI requires that all MPI processes be able to reach >>> each other. This error can sometimes be the result of forgetting to >>> specify the "self" BTL. >>> >>> Process 1 ([[56972,2],0]) is on host: rust >>> Process 2 ([[56972,1],0]) is on host: rust >>> BTLs attempted: self sm >>> >>> Your MPI job is now going to abort; sorry. >>> >>> -------------------------------------------------------------------------- >>> >>> By adding the tcp btl I can run the job. I don't understand why openmpi >>> claims that a pair of processes can not reach each other, all processor >>> cores should have access to all memory after all. Do I need to set some >>> other btl limit? >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > <main.c><child.c>_______________________________________________ > > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >