[R-pkg-devel] RFC: An ad-hoc "cluster" one can leave and rejoin later

Ivan Krylov Wed, 26 Apr 2023 08:00:50 -0700

Hello R-package-devel members,

I've got an idea for a package. I'm definitely reinventing a wheel, but
I couldn't find anything that would fulfil 100% of my requirements.


I've got a computational experiment that takes a while to complete, but
the set of machines that can run it varies during the day. For example,
I can leave a computer running in my bedroom, but I'd rather turn it
off for the night. For now, I work around the problem with a lot of
caching [*], restarting the job with different cluster geometries and
letting it load the parts that are already done from the disk.

Here's a proof of concept implementation of a server that sits between
the clients and a pool of compute nodes, dynamically distributing the
tasks between the nodes: https://github.com/aitap/nodepool

In addition to letting nodes come and go as they like, it also doesn't
strain R's NCONNECTIONS limit on nodes and clients (although the pool
would still benefit from it being increased) and only requires the pool
to be available for inbound connections [**].

It's definitely not CRAN quality yet and at the very least needs a
better task submission API, but it does seem to work. Does it sound
like it could be useful in your own work? Any ideas I could implement,
besides those mentioned in the README?

Here's a terrible hack: the pool speaks R's cluster protocol. One
could, in theory, construct a mock-"cluster" object consisting of
connections to the pool server and use parLapplyLB to distribute a
number of tasks between the pool nodes. But that's a bad idea for a lot
of reasons.

-- 
Best regards,
Ivan

[*] I need caching anyway because some of my machines have hardware
problems and may just reboot for no reason.

[**] Although Henrik Bengtsson's excellent
parallelly::makeClusterPSOCK() makes it much less of a problem.

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[R-pkg-devel] RFC: An ad-hoc "cluster" one can leave and rejoin later

Reply via email to