On Thu, Sep 5, 2013 at 6:48 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 13-09-04 11:36 PM, Scott Kostyshak wrote: >> >> On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch <murdoch.dun...@gmail.com> >> wrote: >>> >>> In a number of places internal to R, we need to know which files have >>> changed (e.g. after building a vignette). I've just written a general >>> purpose function "changedFiles" that I'll probably commit to R-devel. >>> Comments on the design (or bug reports) would be appreciated. >>> >>> The source for the function and the Rd page for it are inline below. >> >> >> This looks like a useful function. Thanks for writing it. I have only >> one (picky) comment below. >> >>> ----- changedFiles.R: >>> changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), >>> file.info = NULL, >>> md5sum = FALSE, full.names = FALSE, ...) { >>> dosnapshot <- function(args) { >>> fullnames <- do.call(list.files, c(full.names = TRUE, args)) >>> names <- do.call(list.files, c(full.names = full.names, args)) >>> if (isTRUE(file.info) || (is.character(file.info) && >>> length(file.info))) { >>> info <- file.info(fullnames) >>> rownames(info) <- names >>> if (isTRUE(file.info)) >>> file.info <- c("size", "isdir", "mode", "mtime") >>> } else >>> info <- data.frame(row.names=names) >>> if (md5sum) >>> info <- data.frame(info, md5sum = tools::md5sum(fullnames)) >>> list(info = info, timestamp = timestamp, file.info = file.info, >>> md5sum = md5sum, full.names = full.names, args = args) >>> } >>> if (missing(snapshot) || !inherits(snapshot, >>> "changedFilesSnapshot")) { >>> if (length(timestamp) == 1) >>> file.create(timestamp) >>> if (missing(snapshot)) snapshot <- "." >>> pre <- dosnapshot(list(path = snapshot, ...)) >>> pre$pre <- pre$info >>> pre$info <- NULL >>> pre$wd <- getwd() >>> class(pre) <- "changedFilesSnapshot" >>> return(pre) >>> } >>> >>> if (missing(timestamp)) timestamp <- snapshot$timestamp >>> if (missing(file.info) || isTRUE(file.info)) file.info <- >>> snapshot$file.info >>> if (identical(file.info, FALSE)) file.info <- NULL >>> if (missing(md5sum)) md5sum <- snapshot$md5sum >>> if (missing(full.names)) full.names <- snapshot$full.names >>> >>> pre <- snapshot$pre >>> savewd <- getwd() >>> on.exit(setwd(savewd)) >>> setwd(snapshot$wd) >>> >>> args <- snapshot$args >>> newargs <- list(...) >>> args[names(newargs)] <- newargs >>> post <- dosnapshot(args)$info >>> prenames <- rownames(pre) >>> postnames <- rownames(post) >>> >>> added <- setdiff(postnames, prenames) >>> deleted <- setdiff(prenames, postnames) >>> common <- intersect(prenames, postnames) >>> >>> if (length(file.info)) { >>> preinfo <- pre[common, file.info] >>> postinfo <- post[common, file.info] >>> changes <- preinfo != postinfo >>> } >>> else changes <- matrix(logical(0), nrow = length(common), ncol = 0, >>> dimnames = list(common, character(0))) >>> if (length(timestamp)) >>> changes <- cbind(changes, Newer = file_test("-nt", common, >>> timestamp)) >>> if (md5sum) { >>> premd5 <- pre[common, "md5sum"] >>> postmd5 <- post[common, "md5sum"] >>> changes <- cbind(changes, md5sum = premd5 != postmd5) >>> } >>> changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = >>> FALSE] >>> changed <- rownames(changes1) >>> structure(list(added = added, deleted = deleted, changed = changed, >>> unchanged = setdiff(common, changed), changes = changes), class >>> = >>> "changedFiles") >>> } >>> >>> print.changedFilesSnapshot <- function(x, ...) { >>> cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n >>> file.info = ", >>> if (length(x$file.info)) paste(paste0('"', x$file.info, '"'), >>> collapse=","), >>> "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control >>> = >>> NULL), "\n", sep="") >>> x >>> } >>> >>> print.changedFiles <- function(x, ...) { >>> if (length(x$added)) cat("Files added:\n", paste0(" ", x$added, >>> collapse="\n"), "\n", sep="") >>> if (length(x$deleted)) cat("Files deleted:\n", paste0(" ", >>> x$deleted, >>> collapse="\n"), "\n", sep="") >>> changes <- x$changes >>> changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE] >>> changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE] >>> if (nrow(changes)) { >>> cat("Files changed:\n") >>> print(changes) >>> } >>> x >>> } >>> ---------------------- >>> >>> --- changedFiles.Rd: >>> \name{changedFiles} >>> \alias{changedFiles} >>> \alias{print.changedFiles} >>> \alias{print.changedFilesSnapshot} >>> \title{ >>> Detect which files have changed >>> } >>> \description{ >>> On the first call, \code{changedFiles} takes a snapshot of a selection of >>> files. In subsequent >>> calls, it takes another snapshot, and returns an object containing data >>> on >>> the >>> differences between the two snapshots. The snapshots need not be the >>> same >>> directory; >>> this could be used to compare two directories. >>> } >>> \usage{ >>> changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = >>> NULL, >>> md5sum = FALSE, full.names = FALSE, ...) >>> } >>> \arguments{ >>> \item{snapshot}{ >>> The path to record, or a previous snapshot. See the Details. >>> } >>> \item{timestamp}{ >>> The name of a file to write at the time the initial snapshot >>> is taken. In subsequent calls, modification times of files will be >>> compared >>> to >>> this file, and newer files will be reported as changed. Set to >>> \code{NULL} >>> to skip this test. >>> } >>> \item{file.info}{ >>> A vector of columns from the result of the \code{file.info} function, or >>> a >>> logical value. If >>> \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be >>> used. Set to >>> \code{FALSE} or \code{NULL} to skip this test. See the Details. >>> } >>> \item{md5sum}{ >>> A logical value indicating whether MD5 summaries should be taken as part >>> of >>> the snapshot. >>> } >>> \item{full.names}{ >>> A logical value indicating whether full names (as in >>> \code{\link{list.files}}) should be >>> recorded. >>> } >>> \item{\dots}{ >>> Additional parameters to pass to \code{\link{list.files}} to control the >>> set >>> of files >>> in the snapshots. >>> } >>> } >>> \details{ >>> This function works in two modes. If the \code{snapshot} argument is >>> missing or is >>> not of S3 class \code{"changedFilesSnapshot"}, it is used as the >>> \code{path} >>> argument >>> to \code{\link{list.files}} to obtain a list of files. If it is of class >>> \code{"changedFilesSnapshot"}, then it is taken to be the baseline file >>> and a new snapshot is taken and compared with it. In the latter case, >>> missing >>> arguments default to match those from the initial snapshot. >>> >>> If the \code{timestamp} argument is length 1, a file with that name is >>> created >>> in the current directory during the initial snapshot, and >>> \code{\link{file_test}} >>> is used to compare the age of all files to it during subsequent calls. >>> >>> If the \code{file.info} argument is \code{TRUE} or it contains a >>> non-empty >>> character vector, the indicated columns from the result of a call to >>> \code{\link{file.info}} will be recorded and compared. >>> >>> If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function >>> will be called to record the 32 byte MD5 checksum for each file, and >>> these >>> values >>> will be compared. >>> } >>> \value{ >>> In the initial snapshot phase, an object of class >>> \code{"changedFilesSnapshot"} is returned. This >>> is a list containing the fields >>> \item{pre}{a dataframe whose rownames are the filenames, and whose >>> columns >>> contain the >>> requested snapshot data} >>> \item{timestamp, file.info, md5sum, full.names}{a record of the arguments >>> in >>> the initial call} >>> \item{args}{other arguments passed via \code{...} to >>> \code{\link{list.files}}.} >>> >>> In the comparison phase, an object of class \code{"changedFiles"}. This >>> is a >>> list containing >>> \item{added, deleted, changed, unchanged}{character vectors of filenames >>> from the before >>> and after snapshots, with obvious meanings} >>> \item{changes}{a logical matrix with a row for each common file, and a >>> column for each >>> comparison test. \code{TRUE} indicates a change in that test.} >>> >>> \code{\link{print}} methods are defined for each of these types. The >>> \code{\link{print}} method for \code{"changedFilesSnapshot"} objects >>> displays the arguments used to produce it, while the one for >>> \code{"changedFiles"} displays the \code{added}, \code{deleted} >>> and \code{changed} fields if non-empty, and a submatrix of the >>> \code{changes} >>> matrix containing all of the \code{TRUE} values. >>> } >>> \author{ >>> Duncan Murdoch >>> } >>> \seealso{ >>> \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}. >>> } >>> \examples{ >>> # Create some files in a temporary directory >>> dir <- tempfile() >>> dir.create(dir) >> >> >> Should a different name than 'dir' be used since 'dir' is a base function? > > > Such as?
'dir_', 'dir1', 'temp_dir', none of which is a base function. I thought that it was not recommended to create objects with the same name as functions, but perhaps this recommended practice is not agreed on. >> Further, if someone is not very familiar with R (or just not in "R >> mode" at the time of reading), they might think that 'dir.create' is >> calling the create member of the object named 'dir' that you just >> made. > > > dir.create is an existing function. I wouldn't have named it that, but > that's its name. I meant that if the object is called, e.g. 'temp_dir', one will not think that 'dir.create' is a call to the 'create' member of 'dir' because there is no 'dir' object apart from the base function. But anyone with experience in R would know that this is not how R parses 'dir.create'. In any case, I shouldn't waste your time on such a minor and subjective thing. Scott > Duncan Murdoch > > >> >> Scott >> >>> writeBin(1, file.path(dir, "file1")) >>> writeBin(2, file.path(dir, "file2")) >>> dir.create(file.path(dir, "dir")) >>> >>> # Take a snapshot >>> snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE) >>> >>> # Change one of the files >>> writeBin(3, file.path(dir, "file2")) >>> >>> # Display the detected changes >>> changedFiles(snapshot) >>> changedFiles(snapshot)$changes >>> } >>> \keyword{utilities} >>> \keyword{file} >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> >> -- >> Scott Kostyshak >> Economics PhD Candidate >> Princeton University >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> -- Scott Kostyshak Economics PhD Candidate Princeton University ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel