[R] Overnight Cluster (Whitepaper)?

ivo welch Wed, 30 Apr 2025 01:48:26 -0700

We have about 50 different mac computers, all ARM, distributed across our
offices.  They range from a few M1's with 8 GB all the way to M4's with 64
GB.  (The M4 mini for $600 is an amazing compute engine!)


These computers are mostly idle overnight.  We have no interest in
bitmining and SETI@home doesn't seem so very active any more, either.
Alas, it's 2025 now, so maybe there is something better we could do with
all this idle compute power when it comes to our own statistical analyses.
Maybe we could cluster them overnight.

I likely could convince my colleagues to run a cron job (or systemctl, well
loadctl) that starts listening at 7pm and ends it around 7am, sharing say
80% of their memory and CPU, plus say 32GB of SSD.  I won't be able to
actively administer their computers, so the client has to be easy for them
to install, turn on, and turn off, accept programs and inputs, cache some
of the data, and send back output.  (The sharing would only be on the local
network, not the entire internet, making them feel more comfortable with
it.)

Ideally, we would then have a frontend R (controller) that could run
`mclapply` statements on this Franken-computer, and be smart enough about
how to distribute the load.  For example, an M4 is about 1.5x as fast as an
M1 on a single CPU, and it's easy to count up CPUs.  If my job is estimated
to need 4GB per core, presumably I wouldn't want to start 50 processes on a
computer that has 10 cores and 8GB.  If the frontend estimates that the
upload and download will take longer than the savings, it should just
forget about distributing it.  And so on.  Reasonable rules, perhaps
indicated by the user and/or assessable from a few local mclapply runs
first.  It's almost like profiling the job for a few minutes or few
iterations locally, and then deciding whether to send off parts of it to
all the other computer nodes on this Franken-net.

I am not holding my breath on ChatGPT and artificial intelligence, of
course.  However, this seems like a hard but feasible engineering problem.
Is there a vendor who sells a plug-and-play solution to this problem?  I am
guessing we are not unusual in a setup like this, though an upper price
bound on the software here is of course just the cost of buying a giant
homogeneous computer or using Amazon resources.

Pointers appreciated.

/iaw

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Overnight Cluster (Whitepaper)?

Reply via email to