The R.cache package on CRAN provides can be used for this purpose. It works on all platforms. Per CRAN Policies, it will prompt the user (in an interactive session) whether they wish to use a persistent cache folder, or to fall back to temporary one. For example,
> path <- R.cache::getCachePath(dirs = "MyDataPkg") The R.cache package needs to create a directory that will hold cache files. It is convenient to use '/home/hb/.cache/R/R.cache' because it follows the standard on your operating system and it remains also after restarting R. Do you wish to create the '/home/hb/.cache/R/R.cache' directory? If not, a temporary directory (/tmp/hb/RtmpvEgWIr/.Rcache) that is specific to this R session will be used. [Y/n]: > path [1] "/home/hb/.cache/R/R.cache/MyDataPkg" > Once the user have accepted this, this folder is created and will be available in all future R session. That is, next time they start R there will be no prompt: > path <- R.cache::getCachePath(dirs = "MyDataPkg") > path [1] "/home/hb/.cache/R/R.cache/MyDataPkg" This will also be the case in non-interactive session. If that folder does not exists in non-interactive session, then a temporary folder will be used (= effectively making the cache lifetime equal to the session lifetime). 'R CMD check' will always use a temporary cache such that there is no memory between checks. /Henrik (disclaimer: I'm the author) On Sun, Dec 15, 2019 at 9:27 AM <b...@denney.ws> wrote: > > Hi Uwe, > > Thanks for this information, and it makes sense to me. Is there a preferred > way to cache the data locally? > > None of the ways that I can think to cache the data sound particularly good, > and I wonder if I'm missing something. The ideas that occur to me are: > > 1. Download them into the package directory `path.package("datapkg")`, but > that would require an action to be performed on package installation, and I'm > unaware of any way to trigger an action on installation. > 2. Have a user-specified cache directory (e.g. > `options("datapkg_cache"="/my/cache/location")`), but that would require > interaction with every use. (Not horrible, but it will likely significantly > increase the number of user issues with the package.) > 3. Have a user-specified cache directory like #2, but have it default to > somewhere in their home directory like `file.path(Sys.getenv("HOME"), > "datapkg_cache")` if they have not set the option. > > To me #3 sounds best, but I'd like to be sure that I'm not missing something. > > Thanks, > > Bill > > -----Original Message----- > From: Uwe Ligges <lig...@statistik.tu-dortmund.de> > Sent: Sunday, December 15, 2019 11:54 AM > To: b...@denney.ws; r-package-devel@r-project.org > Subject: Re: [R-pkg-devel] Large Data Package CRAN Preferences > > Ideally yoiu wpuld host the data elsewhere and submit a CRAN package that > allows users to easily get/merge/aggregate the data. > > Best, > Uwe Ligges > > > > On 12.12.2019 20:55, b...@denney.ws wrote: > > Hello, > > > > > > > > I have two questions about creating data packages for data that will > > be updated and in total are >5 MB in size. > > > > > > > > The first question is: > > > > > > > > In the CRAN policy, it indicates that packages should be ?5 MB in size > > in general. Within a package that I'm working on, I need access to > > data that are updated approximately quarterly, including the > > historical datasets (specifically, these are the SDTM and CDASH > > terminologies in https://evs.nci.nih.gov/ftp1/CDISC/SDTM/Archive/). > > > > > > > > Current individual data updates are approximately 1 MB when > > individually saved as .RDS, and the total current set is about 20 MB. > > I think that the preferred way to generate these packages since there > > will be future updates is to generate one data package for each update > > and then have an umbrella package that will depend on each of the > > individual data update packages. > > That seems like it will minimize space requirements on CRAN since old > > data will probably never need to be updated (though I will need to access > > it). > > > > > > > > Is that an accurate summary of the best practice for creating these as > > a data package? > > > > > > > > And a second question is: > > > > > > > > Assuming the best practice is the one I described above, the typical > > need will be to combine the individual historical datasets for local > > use. An initial test of the time to combine the data indicates that > > it would take about 1 minute to do, but after combination, the result > > could be loaded faster. I'd like to store the combined dataset > > locally with the umbrella package. I believe that it is considered > > poor form to write within the library location for a package except during > > installation. > > > > > > > > What is the best practice for caching the resulting large dataset > > which is locally-generated? > > > > > > > > Thanks, > > > > > > > > Bill > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-package-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-package-devel > > > > ______________________________________________ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel