On Thu, 17 Apr 2008, Alex Brown wrote: > Adding a simplify argument to by would suit me fine. > > In my (limited) experience in using R, the automatic simplification that R > does in various situations is one of it's most troublesome features. It > means that I cannot expect a program to work even if I give it data of the > same types as I always have before; any time a dimension is reduced to 1 bad > things happen. > > Is there a master switch I can set so dropping never happens automatically?
Nop, and you would break a lot of code by such a switch. Which is why we are very much against having global options. > Can you please have an option that by reads so I can indicate that by should > never drop/simplify? No, as it will break lots of other people's code. You can have your own version, and then namespaces will protect other code from your changes. > > -Alex > > On 17 Apr 2008, at 07:03, Prof Brian Ripley wrote: > >> Unfortunately your proposed change changes the type of the output: >> simplification is intended in many applications of by(). >> >> Before: >> >>> str(by(mytimes$date[1], mytimes$set[1], function(x)x)) >> by [, 1] 1.21e+09 >> - attr(*, "dimnames")=List of 1 >> ..$ mytimes$set[1]: chr "1" >> - attr(*, "call")= language by.default(data = mytimes$date[1], INDICES = >> mytimes$set[1], FUN = function(x) x) >> >> After: >> >>> str(by(mytimes$date[1], mytimes$set[1], function(x)x)) >> List of 1 >> $ 1: POSIXct[1:1], format: "2008-04-17 06:53:31" >> - attr(*, "dim")= int 1 >> - attr(*, "dimnames")=List of 1 >> ..$ mytimes$set[1]: chr "1" >> - attr(*, "call")= language by.default(data = mytimes$date[1], INDICES = >> mytimes$set[1], FUN = function(x) x) >> - attr(*, "class")= chr "by" >> >> c() does not do the same thing as unlist() in general, and it is untrue >> that 'c does not strip class'. What happens in your example is that there >> is a c() method for your class (and not many others). >> >> What we could is to add a 'simplify' argument to by() so you can control >> the simplification. >> >> >> On Tue, 15 Apr 2008, Alex Brown wrote: >> >>> summary: >>> >>> The function 'by' inconsistently strips class from the data to which >>> it is applied. >>> >>> quick reason: >>> >>> tapply strips class when simplify is set to TRUE (the default) due to >>> the class stripping behaviour of unlist. >>> >>> quick answer: >>> >>> This can be fixed by invoking tapply with simplify=FALSE, or changing >>> tapply to use do.call(c instead of unlist >>> >>> executable example: >>> >>> mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2)) >>> >>> by(mytimes$date, mytimes$set, function(x)x) >>> >>> INDICES: 1 >>> [1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST" >>> ---------------------------------------------------------------------------------------- >>> INDICES: 2 >>> [1] "2008-04-15 11:41:40 BST" >>> >>> by(mytimes[1,]$date, mytimes[1,]$set, function(x)x) >>> >>> INDICES: 1 >>> [1] 1208256099 >>> >>> why this is a problem: >>> >>> This is a problem when you are feeding the output of this by into a >>> function which expects the class to be maintained. I see this problem >>> when constructing >>> >>> reason: >>> >>> tapply strips class when simplify is set to TRUE (the default) due to >>> the behaviour of unlist: >>> >>> "Where possible the list elements are coerced to a common mode during >>> the unlisting, and so the result often ends up as a character vector. >>> Vectors will be coerced to the highest type of the components in the >>> hierarchy NULL < raw < logical < integer < real < complex < character >>> < list < expression: pairlists are treated as lists." >>> >>> solution: >>> >>> This problem can be fixed in the function by.data.frame by modifying >>> the call to tapply in the function "by": >>> >>> by.data.frame = function (data, INDICES, FUN, ...) >>> { >>> if (!is.list(INDICES)) { >>> IND <- vector("list", 1) >>> IND[[1]] <- INDICES >>> names(IND) <- deparse(substitute(INDICES))[1] >>> } >>> else IND <- INDICES >>> FUNx <- function(x) FUN(data[x, ], ...) >>> nd <- nrow(data) >>> <<<< >>> ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data) >>> ==== >>> ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)), >>> data) >>>>>>> >>> attr(ans, "call") <- match.call() >>> class(ans) <- "by" >>> ans >>> } >>> >>> alternative solution: >>> >>> the call in tapply to unlist(ans, recursive=F) can be replaced by >>> do.call(c,ans, recursive=F) to fix this issue, since c does not strip >>> class. >>> >>> However, I haven't taken the time to work out if this will work in all >>> cases. >>> >>> for example: >>> >>> function (X, INDEX, FUN = NULL, ..., simplify = TRUE) >>> { >>> FUN <- if (!is.null(FUN)) >>> match.fun(FUN) >>> if (!is.list(INDEX)) >>> INDEX <- list(INDEX) >>> nI <- length(INDEX) >>> namelist <- vector("list", nI) >>> names(namelist) <- names(INDEX) >>> extent <- integer(nI) >>> nx <- length(X) >>> one <- 1L >>> group <- rep.int(one, nx) >>> ngroup <- one >>> for (i in seq.int(INDEX)) { >>> index <- as.factor(INDEX[[i]]) >>> if (length(index) != nx) >>> stop("arguments must have same length") >>> namelist[[i]] <- levels(index) >>> extent[i] <- nlevels(index) >>> group <- group + ngroup * (as.integer(index) - one) >>> ngroup <- ngroup * nlevels(index) >>> } >>> if (is.null(FUN)) >>> return(group) >>> ans <- lapply(split(X, group), FUN, ...) >>> index <- as.integer(names(ans)) >>> if (simplify && all(unlist(lapply(ans, length)) == 1)) { >>> ansmat <- array(dim = extent, dimnames = namelist) >>> <<<< >>> ans <- unlist(ans, recursive = FALSE) >>> ==== >>> ans <- do.call(c, ans, recursive = FALSE) >>>>>>> >>> } >>> else { >>> ansmat <- array(vector("list", prod(extent)), dim = extent, >>> dimnames = namelist) >>> } >>> if (length(index)) { >>> names(ans) <- NULL >>> ansmat[index] <- ans >>> } >>> ansmat >>> } >>> >>> Alexander Brown >>> Principal Engineer >>> Transitive >>> Maybrook House, 40 Blackfriars Street, Manchester M3 2EG >>> Phone: +44 (0)161 836 2321 Fax: +44 (0)161 836 2399 Mobile: +44 >>> (0)7980 708 221 >>> www.transitive.com >>> * The leader in cross-platform virtualization >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> -- >> Brian D. Ripley, [EMAIL PROTECTED] >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >> University of Oxford, Tel: +44 1865 272861 (self) >> 1 South Parks Road, +44 1865 272866 (PA) >> Oxford OX1 3TG, UK Fax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.