I would like to suggest a patch against R-devel-2007-07-24, which modifies function sweep by including a warning, if dim(STATS) is not consistent with dim(x)[MARGIN]. If check.margin=FALSE, the simple test whether prod(dim(x)[MARGIN]) is a multiple of length(STATS) is performed. If check.margin=TRUE, then a more restrictive test is used, but a limited recycling is still allowed without warning. Besides generating a warning in some situations, there is no other change in the behavior of sweep. The patch is:
--- R-devel_2007-07-24/src/library/base/R/sweep.R 2007-01-04 17:51:53.000000000 +0100 +++ R-devel_2007-07-24-sweep/src/library/base/R/sweep.R 2007-07-24 10:56:18.000000000 +0200 @@ -1,7 +1,21 @@ -sweep <- function(x, MARGIN, STATS, FUN = "-", ...) +sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...) { FUN <- match.fun(FUN) dims <- dim(x) + if (check.margin) { + dimmargin <- dims[MARGIN] + dimmargin <- dimmargin[dimmargin > 1] + dimstats <- if (is.null(dim(STATS))) length(STATS) else dim(STATS) + dimstats <- dimstats[dimstats > 1] + if (length(dimstats) > length(dimmargin) || + any(dimstats != dimmargin[seq(along.with=dimstats)])) + warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]") + } else if (prod(dims[MARGIN]) %% length(STATS) != 0) { + if (length(MARGIN) == 1) + warning("dim(x)[MARGIN] is not a multiple of length(STATS)") + else + warning("prod(dim(x)[MARGIN]) is not a multiple of length(STATS)") + } perm <- c(MARGIN, (1:length(dims))[ - MARGIN]) FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) } The patch uses the default check.margin=FALSE, since this is more backward compatible. Changing the default to check.margin=TRUE would also be fine with me and also with Ben Bolker, who told me this in a separate email. Let me include more comments on the stricter test. If check.margin=TRUE, then the patch tests whether (after deleting possible dimensions with only one level) dim(STATS) is a prefix of dim(x)[MARGIN]. Hence, for example, if dim(x)[MARGIN] = c(k1,k2), the cases length(STATS) = 1, dim(STATS) = k1, dim(STATS) = NULL and length(STATS) = k1, dim(STATS) = c(k1,k2) are accepted without warning. On the other hand, if k1 != k2, then, for example, dim(STATS)= k2, dim(STATS) = c(k2,k1) generate a warning, although the simple divisibility condition length(STATS) divides prod(dim(x)[MARGIN]) is satisfied. The warning is generated, since in the last two cases, recycling produces incorrect or at least suspicious result. In the simplest case, when length(MARGIN)=1 and STATS is a vector, the cases accepted by the stricter test without warning are exactly the following two: length(STATS) = 1, length(STATS) = dim(x)[MARGIN]. I tested the patch using the script http://www.cs.cas.cz/~savicky/R-devel/verify_sweep1.R Ben Bolker also tested the patch in his environment. I appreciate to know the opinion of R core developers on this patch. Thank you in advance. Petr. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel