On Sat, 9 Oct 2021, Viechtbauer, Wolfgang (SP) wrote:
One thing I did not see mentioned in this thread (pun intended) so far:
For what kind of computations is multithreading supposed to be used within the
package being developed? If the computations involve a lot of linear/matrix
algebra, then one could just use R with other linear algebra routines (e.g.,
OpenBLAS, Atlas, MKL, BLIS) and get the performance benefits of multicore
processing of those computations without having to change a single line of code
in the package (although in my experience, most of the performance benefits
come from switching to something like OpenBLAS and using it single-threaded).
This is meant for the RMVL package, which memory maps MVL format files for
direct access. The package also provides database functionality.
The files I am interested in are large. For example, the Gaia DR3 dataset
is 500GB+.
Plain linear algebra will likely not need multithreading - the computation
will proceed at the speed of storage I/O (which is quite impressive
nowadays).
But it will be useful to multithread more involved code that builds or
queries indices, and I was also thinking of some functions to assist with
visualization - plot() and xyplot() were not meant for very long vectors.
Ideally, one would be able to explore such large data sets interactively.
And then do more interesting things on the cluster.
This aside, I am personally more in favor of explicitly parallelizing those
things that are known to be embarrassingly parallelizable using packages like
parallel, future, etc. since a package author should know best when these
situations arise and can take the necessary steps to parallelize those
computations -- but making the use of parallel processing in these cases an
option, not a default. I have seen way too many cases in HPC environments where
jobs are being parallelized, the package is doing parallel processing, and
multicore linear algebra routines are being used all simultaneously, which is
just a disaster.
Finally, I don't think the HPC task view has been mentioned so far:
https://cran.r-project.org/web/views/HighPerformanceComputing.html
Thanks for the link !
I see there is an OpenCL package, very interesting.
best
Vladimir Dergachev
(not even by Dirk just now, who maintains it!)
Best,
Wolfgang
-----Original Message-----
From: R-package-devel [mailto:r-package-devel-boun...@r-project.org] On Behalf
Of
Dirk Eddelbuettel
Sent: Saturday, 09 October, 2021 18:33
To: Ben Bolker
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] [Tagged] Re: multithreading in packages
On 9 October 2021 at 12:08, Ben Bolker wrote:
| FWIW there is some machinery in the glmmTMB package for querying,
| setting, etc. the number of OpenMP threads.
|
| https://github.com/glmmTMB/glmmTMB/search?q=omp
https://cloud.r-project.org/package=RhpcBLASctl
Dirk
--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel