Hi Uwe, I agree and have also been burnt myself by programs occupying the maximum number of cores available.
My understanding is that in the absence of explicit parallelisation, use of data.table in a package should not lead to this type of behaviour? Best, Scott On Wed, 23 Aug 2023 at 14:30, Uwe Ligges <lig...@statistik.tu-dortmund.de> wrote: > I (any many collegues here) have been caught several times by the > following example: > > 1. did something in parallel on a cluster, set up via > parallel::makeCluster(). > 2. e.g. allocated 20 cores and got them on one single machine > 3. ran some code in parallel via parLapply() > > Bang! 400 threads; > So I have started 20 parallel processes, each of which is using the > automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted > by the cluster to 20 (rather than 1). > > Hence, I really believe a default should always be small, not only in > examples and tests, but generally. And people who aim for more should be > able to increase the defaults. > > Do you believe a software that auto-occupies a 96 core machines with 96 > threads by default is sensible? > > Best, > Uwe Ligges > > > > > > > On 21.08.2023 21:59, Berry Boessenkool wrote: > > > > If you add that to each exported function, isn't that a lot of code to > read + maintain? > > Also, it seems like unnecessary computational overhead. > > From a software design point of view, it might be nicer to set that in > the examples + tests. > > > > Regards, > > Berry > > > > ________________________________ > > From: R-package-devel <r-package-devel-boun...@r-project.org> on behalf > of Scott Ritchie <sritchi...@gmail.com> > > Sent: Monday, August 21, 2023 19:23 > > To: Dirk Eddelbuettel <e...@debian.org> > > Cc: r-package-devel@r-project.org <r-package-devel@r-project.org> > > Subject: Re: [R-pkg-devel] Trouble with long-running tests on CRAN > debian server > > > > Thanks Dirk and Ivan, > > > > I took a slightly different work-around of forcing the number of threads > to > > 1 when running functions of the test dataset in the package, by adding > the > > following to each user facing function: > > > > ``` > > # Check if running on package test_data, and if so, force data.table > to > > be > > # single threaded so that we can avoid a NOTE on CRAN submission > > if (isTRUE(all.equal(x, ukbnmr::test_data))) { > > registered_threads <- getDTthreads() > > setDTthreads(1) > > on.exit({ setDTthreads(registered_threads) }) # re-register so no > > unintended side effects for users > > } > > ``` > > (i.e. here x is the input argument to the function) > > > > It took some trial and error to get to pass the CRAN tests; the number of > > columns in the input data was also contributing to the problem. > > > > Best, > > > > Scott > > > > > > On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <e...@debian.org> wrote: > > > >> > >> On 21 August 2023 at 16:05, Ivan Krylov wrote: > >> | Dirk is probably right that it's a good idea to have > OMP_THREAD_LIMIT=2 > >> | set on the CRAN check machine. Either that, or place the > responsibility > >> | on data.table for setting the right number of threads by default. But > >> | that's a policy question: should a CRAN package start no more than two > >> | threads/child processes even if it doesn't know it's running in an > >> | environment where the CPU time / elapsed time limit is two? > >> > >> Methinks that given this language in the CRAN Repository Policy > >> > >> If running a package uses multiple threads/cores it must never use > more > >> than two simultaneously: the check farm is a shared resource and will > >> typically be running many checks simultaneously. > >> > >> it would indeed be nice if this variable, and/or equivalent ones, were > set. > >> > >> As I mentioned before, I had long added a similar throttle (not for > >> data.table) in a package I look after (for work, even). So a similar > >> throttler with optionality is below. I'll add this to my `dang` package > >> collecting various functions. > >> > >> A usage example follows. It does nothing by default, ensuring 'full > power' > >> but reflects the minimum of two possible options, or an explicit count: > >> > >> > dang::limitDataTableCores(verbose=TRUE) > >> Limiting data.table to '12'. > >> > Sys.setenv("OMP_THREAD_LIMIT"=3); > >> dang::limitDataTableCores(verbose=TRUE) > >> Limiting data.table to '3'. > >> > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE) > >> Limiting data.table to '2'. > >> > dang::limitDataTableCores(1, verbose=TRUE) > >> Limiting data.table to '1'. > >> > > >> > >> That makes it, in my eyes, preferable to any unconditional 'always pick > 1 > >> thread'. > >> > >> Dirk > >> > >> > >> ##' Set threads for data.table respecting possible local settings > >> ##' > >> ##' This function set the number of threads \pkg{data.table} will use > >> ##' while reflecting two possible machine-specific settings from the > >> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R > >> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds). > >> ##' @title Set data.table threads respecting default settingss > >> ##' @param ncores A numeric or character variable with the desired > >> ##' count of threads to use > >> ##' @param verbose A logical value with a default of \sQuote{FALSE} to > >> ##' operate more verbosely > >> ##' @return The return value of the \pkg{data.table} function > >> ##' \code{setDTthreads} which is called as a side-effect. > >> ##' @author Dirk Eddelbuettel > >> ##' @export > >> limitDataTableCores <- function(ncores, verbose = FALSE) { > >> if (missing(ncores)) { > >> ## start with a simple fallback: 'Ncpus' (if set) or else 2 > >> ncores <- getOption("Ncpus", 2L) > >> ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), > gets > >> NA if envvar unset > >> ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT")) > >> ## and then keep the smaller > >> ncores <- min(na.omit(c(ncores, ompcores))) > >> } > >> stopifnot("Package 'data.table' must be installed." = > >> requireNamespace("data.table", quietly=TRUE)) > >> stopifnot("Argument 'ncores' must be numeric or character" = > >> is.numeric(ncores) || is.character(ncores)) > >> if (verbose) message("Limiting data.table to '", ncores, "'.") > >> data.table::setDTthreads(ncores) > >> } > >> > >> | > >> | -- > >> | Best regards, > >> | Ivan > >> | > >> | ______________________________________________ > >> | R-package-devel@r-project.org mailing list > >> | https://stat.ethz.ch/mailman/listinfo/r-package-devel > >> > >> -- > >> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-package-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-package-devel > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-package-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-package-devel > [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel