I spoke a bit too soon. You may have noticed that it isn't hanging in parSapply, it's hanging in mpi.spawn.Rslaves(). It claims to have launched the slaves, but I can't see them by logging into the target node and running 'top'. I only see the top level R process (which is burning up 100% of a CPU). So I don't know what's going on. It never gets back from the spawn call anyway.
-----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Leek, Jim Sent: Wednesday, September 03, 2014 10:25 PM To: Martin Morgan; r-help@r-project.org Subject: Re: [R] snow/Rmpi without MPI.spawn? Thanks for the tips. I'll take a look around for for loops in the morning. I think the example you provided worked for OpenMPI. (The default on our machine is MPICH2, but it gave the same error about calling spawn.) Anyway, with OpenMPI I got this: > # salloc -n 12 orterun -n 1 R -f spawn.R > library(Rmpi) > ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers <- > mpi.universe.size() nslaves = 4 > mpi.spawn.Rslaves(nslaves) Reported: 2 (out of 2) daemons - 4 (out of 4) procs Then it hung there. So things spawned anyway, which is progress. I'm just not sure is that expected behavior for parSupply or not. Jim -----Original Message----- From: Martin Morgan [mailto:mtmor...@fhcrc.org] Sent: Wednesday, September 03, 2014 5:08 PM To: Leek, Jim; r-help@r-project.org Subject: Re: [R] snow/Rmpi without MPI.spawn? On 09/03/2014 03:25 PM, Jim Leek wrote: > I'm a programmer at a high-performance computing center. I'm not very > familiar with R, but I have used MPI from C, C++, and Python. I have > to run an R code provided by a guy who knows R, but not MPI. So, this > fellow used the R snow library to parallelize his R code > (theoretically, I'm not actually sure what he did.) I need to get > this code running on our machines. > > However, Rmpi and snow seem to require mpi spawn, which our computing > center doesn't support. I even tried building Rmpi with MPICH1 > instead of 2, because Rmpi has that option, but it still tries to use spawn. > > I can launch plenty of processes, but I have to launch them all at > once at the beginning. Is there any way to convince Rmpi to just use > those processes rather than trying to spawn its own? I haven't found > any documentation on this issue, although I would've thought it would be > quite common. This script spawn.R ======= # salloc -n 12 orterun -n 1 R -f spawn.R library(Rmpi) ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers <- mpi.universe.size() mpi.spawn.Rslaves(nslaves=nWorkers) mpiRank <- function(i) c(i=i, rank=mpi.comm.rank()) mpi.parSapply(seq_len(2*nWorkers), mpiRank) mpi.close.Rslaves() mpi.quit() can be run like the comment suggests salloc -n 12 orterun -n 1 R -f spawn.R uses slurm (or whatever job manager) to allocate resources for 12 tasks and spawn within that allocation. Maybe that's 'good enough' -- spawning within the assigned allocation? Likely this requires minimal modification of the current code. More extensive is to revise the manager/worker-style code to something more like single instruction, multiple data simd.R ====== ## salloc -n 4 orterun R --slave -f simd.R sink("/dev/null") # don't capture output -- more care needed here library(Rmpi) TAGS = list(FROM_WORKER=1L) .comm = 0L ## shared `work', here just determine rank and host work = c(rank=mpi.comm.rank(.comm), host=system("hostname", intern=TRUE)) if (mpi.comm.rank(.comm) == 0) { ## manager mpi.barrier(.comm) nWorkers = mpi.comm.size(.comm) res = list(nWorkers) for (i in seq_len(nWorkers - 1L)) { res[[i]] <- mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER, comm=.comm) } res[[nWorkers]] = work sink() # start capturing output print(do.call(rbind, res)) } else { ## worker mpi.barrier(.comm) mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm) } mpi.quit() but this likely requires some serious code revision; if going this route then http://r-pbd.org/ might be helpful (and from a similar HPC environment). It's always worth asking whether the code is written to be efficient in R -- a typical 'mistake' is to write R-level explicit 'for' loops that "copy-and-append" results, along the lines of len <- 100000 result <- NULL for (i in seq_len(len)) ## some complicated calculation, then... result <- c(result, sqrt(i)) whereas it's much better to "pre-allocate and fill" result <- integer(len) for (i in seq_len(len)) result[[i]] = sqrt(i) or lapply(seq_len(len), sqrt) and very much better still to 'vectorize' result <- sqrt(seq_len(len)) (timing for me are about 1 minute for "copy-and-append", .2 s for "pre-allocate and fill", and .002s for "vectorize"). Pushing back on the guy providing the code (grep for "for" loops, and look for that copy-and-append pattern) might save you from having to use parallel evaluation at all. Martin > > Thanks, > Jim > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.