On 21 August 2023 at 16:05, Ivan Krylov wrote: | Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2 | set on the CRAN check machine. Either that, or place the responsibility | on data.table for setting the right number of threads by default. But | that's a policy question: should a CRAN package start no more than two | threads/child processes even if it doesn't know it's running in an | environment where the CPU time / elapsed time limit is two?
Methinks that given this language in the CRAN Repository Policy If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously. it would indeed be nice if this variable, and/or equivalent ones, were set. As I mentioned before, I had long added a similar throttle (not for data.table) in a package I look after (for work, even). So a similar throttler with optionality is below. I'll add this to my `dang` package collecting various functions. A usage example follows. It does nothing by default, ensuring 'full power' but reflects the minimum of two possible options, or an explicit count: > dang::limitDataTableCores(verbose=TRUE) Limiting data.table to '12'. > Sys.setenv("OMP_THREAD_LIMIT"=3); dang::limitDataTableCores(verbose=TRUE) Limiting data.table to '3'. > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE) Limiting data.table to '2'. > dang::limitDataTableCores(1, verbose=TRUE) Limiting data.table to '1'. > That makes it, in my eyes, preferable to any unconditional 'always pick 1 thread'. Dirk ##' Set threads for data.table respecting possible local settings ##' ##' This function set the number of threads \pkg{data.table} will use ##' while reflecting two possible machine-specific settings from the ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R ##' option \sQuote{Ncpus} (uses e.g. for parallel builds). ##' @title Set data.table threads respecting default settingss ##' @param ncores A numeric or character variable with the desired ##' count of threads to use ##' @param verbose A logical value with a default of \sQuote{FALSE} to ##' operate more verbosely ##' @return The return value of the \pkg{data.table} function ##' \code{setDTthreads} which is called as a side-effect. ##' @author Dirk Eddelbuettel ##' @export limitDataTableCores <- function(ncores, verbose = FALSE) { if (missing(ncores)) { ## start with a simple fallback: 'Ncpus' (if set) or else 2 ncores <- getOption("Ncpus", 2L) ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), gets NA if envvar unset ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT")) ## and then keep the smaller ncores <- min(na.omit(c(ncores, ompcores))) } stopifnot("Package 'data.table' must be installed." = requireNamespace("data.table", quietly=TRUE)) stopifnot("Argument 'ncores' must be numeric or character" = is.numeric(ncores) || is.character(ncores)) if (verbose) message("Limiting data.table to '", ncores, "'.") data.table::setDTthreads(ncores) } | | -- | Best regards, | Ivan | | ______________________________________________ | R-package-devel@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-package-devel -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel