On Sat, 9 Oct 2021, Viechtbauer, Wolfgang (SP) wrote:

One thing I did not see mentioned in this thread (pun intended) so far:

For what kind of computations is multithreading supposed to be used within the 
package being developed? If the computations involve a lot of linear/matrix 
algebra, then one could just use R with other linear algebra routines (e.g., 
OpenBLAS, Atlas, MKL, BLIS) and get the performance benefits of multicore 
processing of those computations without having to change a single line of code 
in the package (although in my experience, most of the performance benefits 
come from switching to something like OpenBLAS and using it single-threaded).

This is meant for the RMVL package, which memory maps MVL format files for direct access. The package also provides database functionality.

The files I am interested in are large. For example, the Gaia DR3 dataset is 500GB+.

Plain linear algebra will likely not need multithreading - the computation will proceed at the speed of storage I/O (which is quite impressive nowadays).

But it will be useful to multithread more involved code that builds or queries indices, and I was also thinking of some functions to assist with visualization - plot() and xyplot() were not meant for very long vectors.

Ideally, one would be able to explore such large data sets interactively.
And then do more interesting things on the cluster.


This aside, I am personally more in favor of explicitly parallelizing those 
things that are known to be embarrassingly parallelizable using packages like 
parallel, future, etc. since a package author should know best when these 
situations arise and can take the necessary steps to parallelize those 
computations -- but making the use of parallel processing in these cases an 
option, not a default. I have seen way too many cases in HPC environments where 
jobs are being parallelized, the package is doing parallel processing, and 
multicore linear algebra routines are being used all simultaneously, which is 
just a disaster.

Finally, I don't think the HPC task view has been mentioned so far:

https://cran.r-project.org/web/views/HighPerformanceComputing.html

Thanks for the link !

I see there is an OpenCL package, very interesting.

best

Vladimir Dergachev


(not even by Dirk just now, who maintains it!)

Best,
Wolfgang

-----Original Message-----
From: R-package-devel [mailto:r-package-devel-boun...@r-project.org] On Behalf 
Of
Dirk Eddelbuettel
Sent: Saturday, 09 October, 2021 18:33
To: Ben Bolker
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] [Tagged] Re: multithreading in packages


On 9 October 2021 at 12:08, Ben Bolker wrote:
|    FWIW there is some machinery in the glmmTMB package for querying,
| setting, etc. the number of OpenMP threads.
|
| https://github.com/glmmTMB/glmmTMB/search?q=omp

https://cloud.r-project.org/package=RhpcBLASctl

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to