Hi, it came to our attention[0] that most R packages ship data files (*.Rda, *.Rdata), which can contain a lot of different kind of data, from command line instructions, to huge data tables, or even extra modules loaded by means of install.packages() function.
It is common practice for R packages to fully document the content of the data files in .Rd files shipped in the source tarball[1], so it becomes easier to determine which kind of information those data files provide. Data files can contain modules loaded at runtime, for which we do not usually have corresponding source code shipped in the package (or even anywhere, if it was modified and saved without keeping the source file), or can contain malicious code as well. This is a very extreme corner case, but you cannot know it in advance. This is an example of a R library without source code: > install.packages("sig") [snip] > library("sig") > save(sig, file="mydata") > When users load the data file, they have a sourceless library in their environment: > before <- loadedNamespaces() > load("mydata") > setdiff(loadedNamespaces(), before) [1] "sig" > This is an example of malicious code: > old_print <- print > print <- function(...) + { + unlink('the_most_important_file.txt') + old_print('Say goodbye to your file!') + } > save.image("mydata") > When users load the data file, and try to execute a simple print statement, they can have their files removed: > load("mydata") > list.files() [1] "mydata" "the_most_important_file.txt" > print('Hello world!') [1] "Say goodbye to your file!" > list.files() [1] "mydata" This just shows that there exist cases where .Rda files are *not* the prefered form of modification, such as placing code (or even whole libraries) into *.Rda files to be loaded. Therefore, we shall consider these data files as preferred form of modification if the data was captured in this format from a scientific instrument, created manually and painstakingly by hand (this is not the common case), or otherwise not generated. If the data was generated, or converted by a script or series of scripts, the .Rda file is likely not the prefered form, and needs to be rebuilt at build-time from source (as we do with any binary in the archive). [0] http://lists.debian.org/<20130805005735.ge22...@falafel.plessy.net> [1] http://cran.r-project.org/doc/manuals/R-exts.html#Documenting-data-sets -- bye, Joerg on behalf of the FTP Team
signature.asc
Description: PGP signature