When I was preparing the patch of sweep submitted on July 25, I was unaware of the code by Heather Turner. She suggested a very elegant solution, if STATS is a vector and we want to use meaningful recycling in full generality. I would like to suggest a combined solution, which uses Heather Turner's algorithm if check.margin=FALSE (default) and STATS is a vector and my previous algorithm, if check.margin=TRUE or STATS is an array. The suggestion is
# combined from the original code of sweep without warnings and from # https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html by Robin Hankin # https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html by Heather Turner # https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html by Ben Bolker # with some further modifications by Petr Savicky sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...) { FUN <- match.fun(FUN) dims <- dim(x) dimmargin <- dims[MARGIN] if (is.null(dim(STATS))) { dimstats <- length(STATS) } else { dimstats <- dim(STATS) check.margin <- TRUE } s <- length(STATS) if (s > prod(dimmargin)) { warning("length of STATS greater than the extent of dim(x)[MARGIN]") } else if (check.margin) { dimmargin <- dimmargin[dimmargin > 1] dimstats <- dimstats[dimstats > 1] if (length(dimstats) > length(dimmargin) || any(dimstats != dimmargin[seq(along.with=dimstats)])) warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]") } else { cumDim <- c(1, cumprod(dimmargin)) upper <- min(cumDim[cumDim >= s]) lower <- max(cumDim[cumDim <= s]) if (upper %% s != 0 || s %% lower != 0) warning("STATS does not recycle exactly across MARGIN") } perm <- c(MARGIN, (1:length(dims))[ - MARGIN]) FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) } Heather presented four examples testing her code: sweep(array(1:24, dim = c(4,3,2)), 1, 1:2) # no warning sweep(array(1:24, dim = c(4,3,2)), 1, 1:12) # no warning sweep(array(1:24, dim = c(4,3,2)), 1, 1:24) # no warning sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3) # warning The second and third example are not really correct, since STATS extends also to dimensions not included in MARGIN. The problem is better visible for example in sweep(array(1:24, dim = c(4,4,3,3,2,2)), c(1,3), 1:12) where MARGIN clearly has to contain two dimensions explicitly. So, I use the examples with a larger margin corresponding to STATS as follows sweep(array(1:24, dim = c(4,3,2)), 1, 1:2) # no warning sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:12) # no warning sweep(array(1:24, dim = c(4,3,2)), 1:3, 1:24) # no warning sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3) # warning The current proposal for sweep indeed gives no warning in the first three examples and gives a warning in the last one. I did not use the suggestion to call the option warn with default warn = getOption("warn"). The reason is that there are two different decisions: (1) whether to generate a warning (2) what to do with the warning, if it is generated. The warn option influences (2): the warning may be suppressed, printed after the return to the top level, printed immediately or it may be converted to an error. I think that the option controling (2) should not be mixed with an option which controls (1). If R has an option controling to which extent recycling is allowed, then this could be used, but warn has a different purpose. I appreciate feedback. Petr. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel