This is a good point. I would prefer to include all the data in the
package, but CRAN has strict limitations on package and subdirectory
size, which the potential data would easily exceed. Whether it is an
active binding or a get function, dynamically downloaded data will
always suffer this problem. Also, there are potential copyright issues
which may prevent including all the relevant data in a package, no
matter how the package is distributed.
For this particular package of ICD data, the biggest risk is not the
data changing, but the data not being made available in the future, or
not being provided in a useful format.
I do allow the user to set the cache directory, which eventually
includes all the raw and processed data, and this could be archived by
the user for reproducibilty. In addition, the test suite covers
potential changes to the source data.
On 3/24/19 11:21 AM, Hong Ooi wrote:
Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As
I understand it, accessing the symbol "foo" will pull the latest version of foo
from the remote site. This has consequences for reproducibility, because now your code
could be exactly the same, and your local environment exactly the same, and yet running
the code at different times can yield different results because the remote data has been
updated.
-----Original Message-----
From: R-package-devel <r-package-devel-boun...@r-project.org> On Behalf Of Jack
Wasey
Sent: Sunday, 24 March 2019 9:57 AM
To: Kirill Müller <krlmlr...@mailbox.org>; R Development
<r-package-devel@r-project.org>
Subject: Re: [R-pkg-devel] active bindings in package namespace
Thanks both, this is helpful advice.
On 3/23/19 5:14 PM, Kirill Müller wrote:
Dear Jack
This doesn't answer your question, but I would advise against this design.
- Users do not expect side effects (such as network access) from accessing a
symbol.
- A function gives you much more flexibility to change the interface
later on. (Arguments for fetching the data, tokens for API access,
...)
- You already encountered a few quirks that make this an "interesting" problem.
A function call only needs a pair of parentheses.
Best regards
Kirill
On 23.03.19 16:50, Jack O. Wasey wrote:
Dear all,
I am developing a package which is a front for various online data (icd.data
https://github.com/jackwasey/icd.data/ ). The current CRAN version just has
lazy-loaded data, but now the package encompasses far more current and historic
ICD codes from different countries, these can't be included in the CRAN package
even with maximal compression.
Other authors have solved this using functions to get the data, with or without
a local cache of the retrieved data. No CRAN or other packages I have found
after extensive searching use the attractive active binding feature of R.
The goal is simple: for the user to refer to the data by its symbol, e.g.,
'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed
transparently (if the user has already granted permission, or after prompt if
they haven't).
The bindings are set using commands alongside the function definitions in R/*.R
.E.g.
makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding,
environment()) lockBinding("icd10cm_latest", environment())
For non-interactive use, CI and CRAN tests, no data should be downloaded, and
no cache directory set up without user consent. For interactive use, I ask
permission to create a local data cache before downloading data.
This works fine... until R CMD check. The following steps seems to 'get' or
'source' everything from the package namespace, which results in triggering the
active bindings, and this fails if I am unable to get consent to download data,
and want to 'stop' on this error condition.
- checking dependencies in R code
- checking S3 generic/method consistency
- checking foreign function calls
- checking R code for possible problems
Debugging CI-specific binding bugs is a nightmare because these occur in
different R sessions initiated by R CMD check.
There may be legitimate reasons to evaluate everything in the
namespace, but I've no idea what they are. Incidentally, Rstudio also
does 'mget' on the whole package namespace and triggers bindings
during autocomplete. https://github.com/rstudio/rstudio/issues/4414
Is this something I should raise as an issue with R? Or does anyone have any
idea of a sensible approach to this. Currently I have a set of workarounds, but
this complicates the code, and has taken an awful lot of time. Does anyone know
of any CRAN package which has active bindings in the package namespace?
Any ideas appreciated.
Jack Wasey
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel