Re: [R] snow/Rmpi without MPI.spawn?

Jim Leek Thu, 04 Sep 2014 08:39:59 -0700

Ah, now it's working. Thanks. Now I just need to figure out how to getsnow doing this...

Jim


On 09/04/2014 05:03 AM, Martin Morgan wrote:

On 09/03/2014 10:24 PM, Leek, Jim wrote:
Thanks for the tips. I'll take a look around for for loops in themorning.
I think the example you provided worked for OpenMPI. (The default onour machine is MPICH2, but it gave the same error about callingspawn.) Anyway, with OpenMPI I got this:
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size() nWorkers <-mpi.universe.size()
(the '## Recent Rmpi bug' comment should have been removed, it's aholdover from when the script was written several years ago)
nslaves = 4
mpi.spawn.Rslaves(nslaves)
The argument needs to be named

  mpi.spawn.Rslaves(nslaves=4)
otherwise R matches unnamed arguments by position, and '4' isassociated with the 'Rscript' argument.
Martin
Reported: 2 (out of 2) daemons - 4 (out of 4) procs
Then it hung there. So things spawned anyway, which is progress.I'm just not sure is that expected behavior for parSupply or not.
Jim

-----Original Message-----
From: Martin Morgan [mailto:mtmor...@fhcrc.org]
Sent: Wednesday, September 03, 2014 5:08 PM
To: Leek, Jim; r-help@r-project.org
Subject: Re: [R] snow/Rmpi without MPI.spawn?

On 09/03/2014 03:25 PM, Jim Leek wrote:
I'm a programmer at a high-performance computing center.  I'm not very
familiar with R, but I have used MPI from C, C++, and Python. I have
to run an R code provided by a guy who knows R, but not MPI. So, this
fellow used the R snow library to parallelize his R code
(theoretically, I'm not actually sure what he did.)  I need to get
this code running on our machines.

However, Rmpi and snow seem to require mpi spawn, which our computing
center doesn't support.  I even tried building Rmpi with MPICH1
instead of 2, because Rmpi has that option, but it still tries touse spawn.
I can launch plenty of processes, but I have to launch them all at
once at the beginning. Is there any way to convince Rmpi to just use
those processes rather than trying to spawn its own?  I haven't found
any documentation on this issue, although I would've thought itwould be quite common.
This script

spawn.R
=======
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size() nWorkers <-mpi.universe.size()
mpi.spawn.Rslaves(nslaves=nWorkers)
mpiRank <- function(i)
    c(i=i, rank=mpi.comm.rank())
mpi.parSapply(seq_len(2*nWorkers), mpiRank)
mpi.close.Rslaves()
mpi.quit()

can be run like the comment suggests

     salloc -n 12 orterun -n 1 R -f spawn.R
uses slurm (or whatever job manager) to allocate resources for 12tasks and spawn within that allocation. Maybe that's 'good enough' --spawning within the assigned allocation? Likely this requires minimalmodification of the current code.
More extensive is to revise the manager/worker-style code tosomething more like single instruction, multiple data
simd.R
======
## salloc -n 4 orterun R --slave -f simd.R
sink("/dev/null") # don't capture output -- more care needed here
library(Rmpi)

TAGS = list(FROM_WORKER=1L)
.comm = 0L

## shared `work', here just determine rank and host
work = c(rank=mpi.comm.rank(.comm),
           host=system("hostname", intern=TRUE))

if (mpi.comm.rank(.comm) == 0) {
      ## manager
      mpi.barrier(.comm)
      nWorkers = mpi.comm.size(.comm)
      res = list(nWorkers)
      for (i in seq_len(nWorkers - 1L)) {
          res[[i]] <- mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER,
                                    comm=.comm)
      }
      res[[nWorkers]] = work
      sink() # start capturing output
      print(do.call(rbind, res))
} else {
      ## worker
      mpi.barrier(.comm)
      mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm)
}
mpi.quit()
but this likely requires some serious code revision; if going thisroute then
http://r-pbd.org/ might be helpful (and from a similar HPC environment).
It's always worth asking whether the code is written to be efficientin R -- a
typical 'mistake' is to write R-level explicit 'for' loops that
"copy-and-append" results, along the lines of

     len <- 100000
     result <- NULL
     for (i in seq_len(len))
         ## some complicated calculation, then...
         result <- c(result, sqrt(i))

whereas it's much better to "pre-allocate and fill"

      result <- integer(len)
      for (i in seq_len(len))
          result[[i]] = sqrt(i)

or

      lapply(seq_len(len), sqrt)

and very much better still to 'vectorize'

      result <- sqrt(seq_len(len))
(timing for me are about 1 minute for "copy-and-append", .2 s for"pre-allocate
and fill", and .002s for "vectorize").
Pushing back on the guy providing the code (grep for "for" loops, andlook for
that copy-and-append pattern) might save you from having to use parallel
evaluation at all.

Martin
Thanks,
Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] snow/Rmpi without MPI.spawn?

Reply via email to