Dear R friends I wanted a function to make a simple percent table that would be easy for students to use. The goal originally was to have a simple thing people would call like this
pctable(rowvar, colvar, data) and the things "rowvar" and "colvar" might be names of variables in data. I wanted to avoid the usage of "with" (as we now see in the table help). Then some people wanted more features, and I agreed with the suggestion to create a formula interface that people can call like so: pctable(rowvar ~ colvar, data) I end up with a generic function pctable and methods pctable.default, pctable.formula, pctable.character. I got that working, mostly I understand what's going on. Except the following, which, actually, is a good lesson to me about promises and method dispatch in R. An S3 generic will not send a call with a promise in the first argument to pctable.default (as I had mistakenly hoped). I'll paste in all the code below, but I think you will know the answer even without running it. pctable is a generic function. In workspace, I have no objects x and y, but there are variables inside data.frame dat named x and y. Since y is not an object, the method dispatch fails thus: > pctable(y, x, dat) Error in pctable(y, x, dat) (from #3) : object 'y' not found This direct call on pctable.default works (recall y and x are promises): > pctable.default(y, x, dat) Count (column %) x y 1 2 3 4 Sum A 5(20%) 3(12%) 5(20%) 6(24%) 19 B 9(36%) 5(20%) 4(16%) 6(24%) 24 C 1(4%) 6(24%) 3(12%) 2(8%) 12 D 4(16%) 4(16%) 6(24%) 5(20%) 19 E 6(24%) 7(28%) 7(28%) 6(24%) 26 Sum 25 25 25 25 100 All the methods work fine when the first argument is a language object. This works (dispatches to pctable.formula) > pctable(y ~ x, dat) Count (column %) x y 1 2 3 4 Sum A 5(20%) 3(12%) 5(20%) 6(24%) 19 B 9(36%) 5(20%) 4(16%) 6(24%) 24 C 1(4%) 6(24%) 3(12%) 2(8%) 12 D 4(16%) 4(16%) 6(24%) 5(20%) 19 E 6(24%) 7(28%) 7(28%) 6(24%) 26 Sum 25 25 25 25 100 This works (dispatches to pctable.default) > pctable(dat$y, dat$x) Count (column %) dat$x dat$y 1 2 3 4 Sum A 5(20%) 3(12%) 5(20%) 6(24%) 19 B 9(36%) 5(20%) 4(16%) 6(24%) 24 C 1(4%) 6(24%) 3(12%) 2(8%) 12 D 4(16%) 4(16%) 6(24%) 5(20%) 19 E 6(24%) 7(28%) 7(28%) 6(24%) 26 Sum 25 25 25 25 100 However, this fails because y is not an object with a type > pctable(y, x, dat) Error in pctable(y, x, dat) (from #3) : object 'y' not found Can R be tricked to send that call to pctable.default, where it does work? Here's the code, I'm working on documentation, will put in package rockchalk eventually, but hate to leave this problem until I fully understand it. pctable <- function(rv, ...) { UseMethod("pctable") } ## rv: row variable, quoted or not ## cv: column variable, quoted or not pctable.default <- function(rv, cv, data = parent.frame(), rvlab = NULL, cvlab = NULL, colpct = TRUE, rowpct = FALSE, exclude = c(NA, NaN), rounded = FALSE) { rvlabel <- if (!missing(rv)) deparse(substitute(rv)) cvlabel <- if (!missing(cv)) deparse(substitute(cv)) rvlab <- if (is.null(rvlab)) rvlabel else rvlab cvlab <- if (is.null(cvlab)) cvlabel else cvlab rvin <- eval(substitute(rv), envir = data, enclos = parent.frame()) cvin <- eval(substitute(cv), envir = data, enclos = parent.frame()) t1 <- table(rvin, cvin, dnn = c(rvlab, cvlab), exclude = exclude) rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters colnames(t1)[is.na(colnames(t1))] <- "NA" if (rounded) t1 <- round(t1, -1) t2 <- addmargins(t1, c(1,2)) t1colpct <- round(100*prop.table(t1, 2), 1) t1rowpct <- round(100*prop.table(t1, 1), 1) t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x)) t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x)) res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct, call = match.call()) class(res) <- "pctable" print(res, colpct = colpct, rowpct = rowpct) invisible(res) } pctable.formula <- function(formula, data = NULL, rvlab = NULL, cvlab = NULL, colpct = TRUE, rowpct = FALSE, exclude = c(NA, NaN), rounded = FALSE, ..., subset = NULL) { if (missing(data) || !is.data.frame(data)) stop("pctable requires a data frame") if (missing(formula) || (length(formula) != 3L)) stop("pctable requires a two sided formula") mt <- terms(formula, data = data) if (attr(mt, "response") == 0L) stop("response variable is required") mf <- match.call(expand.dots = FALSE) keepers <- match(c("formula", "data", "subset", "na.action"), names(mf), 0L) mf <- mf[c(1L, keepers)] mf$drop.unused.levels <- FALSE mf[[1L]] <- quote(stats::model.frame) mf <- eval(mf, parent.frame()) ## response is column 1 rvlab <- if (missing(rvlab)) colnames(mf)[1] else rvlab cvlab <- if (missing(cvlab)) colnames(mf)[2] else cvlab res <- pctable.default(mf[[1L]], mf[[2L]], data = mf, rvlab = rvlab, cvlab = cvlab, colpct = colpct, rowpct = rowpct, exclude = exclude, rounded = rounded) invisible(res) } pctable.character <- function(rowvar, colvar, data = NULL, rvlab = NULL, cvlab = NULL, colpct = TRUE, rowpct = FALSE, exclude = c(NA, NaN), rounded = FALSE, ..., subset = NULL) { if (missing(data) || !is.data.frame(data)) stop("pctable requires a data frame") ## colvar <- if (!is.character(colvar)) deparse(substitute(colvar)) else colvar colvar <- as.character(substitute(colvar))[1L] rvlab <- if (missing(rvlab)) rowvar else rvlab cvlab <- if (missing(cvlab)) colvar else cvlab t1 <- with(data, table(data[[rowvar]], data[[colvar]], dnn = c(rvlab, cvlab), exclude = exclude)) rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters colnames(t1)[is.na(colnames(t1))] <- "NA" if (rounded) t1 <- round(t1, -1) t2 <- addmargins(t1, c(1,2)) t1colpct <- round(100*prop.table(t1, 2), 1) t1rowpct <- round(100*prop.table(t1, 1), 1) t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x)) t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x)) res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct, call = match.call()) class(res) <- "pctable" print(res, colpct = colpct, rowpct = rowpct) invisible(res) } ## OK, I see now I'm doing the same work over and over, will extract ## a middle chunk out of each of those methods. And finally my cool print method. print.pctable <- function(tab, colpct = TRUE, rowpct = FALSE){ count <- tab[["count"]] t3 <- count if (colpct && !rowpct) { cpct <- tab[["colpct"]] for(j in rownames(cpct)){ for(k in colnames(cpct)){ t3[j, k] <- paste0(count[j, k], "(", cpct[j, k], "%)") } } cat("Count (column %)\n") print(t3) return(invisible(t3)) } ## rowpct == TRUE< else would have returned rpct <- tab[["rowpct"]] for(j in rownames(rpct)){ for(k in colnames(rpct)){ t3[j, k] <- paste0(count[j, k], "(", rpct[j, k], "%)") } } if (!colpct) { cat("Count (row %)\n") print(t3) return(invisible(t3)) } else { cpct <- tab[["colpct"]] t4 <- array("", dim = c(1, 1) + c(2,1)*dim(tab$colpct)) t4[seq(1, NROW(t4), 2), ] <- t3 rownames(t4)[seq(1, NROW(t4), 2)] <- rownames(t3) rownames(t4)[is.na(rownames(t4))] <- "" colnames(t4) <- colnames(t3) for(j in rownames(tab[["colpct"]])) { for(k in colnames(tab[["colpct"]])){ t4[1 + which(rownames(t4) == j) ,k] <- paste0(tab[["colpct"]][j, k], "%") } } names(dimnames(t4)) <- names(dimnames(count)) cat("Count (row %)\n") cat("column %\n") print(t4, quote = FALSE) return(invisible(t4)) } } And usage examples dat <- data.frame(x = gl(4, 25), y = sample(c("A", "B", "C", "D", "E"), 100, replace= TRUE)) ## Here's what I was aiming for, in the beginning pctable(y ~ x, dat) pctable(y ~ x, dat, exclude = NULL) pctable(y ~ x, dat, rvlab = "My Outcome Var", cvlab = "My Columns") ## People who like row percents asked for this pctable(y ~ x, dat, rowpct = TRUE, colpct = FALSE) ## Some people want both. Tiresome. pctable(y ~ x, dat, rowpct = TRUE, colpct = TRUE) pctable(y ~ x, dat, rowpct = TRUE, colpct = TRUE, exclude = NULL) tab <- pctable(y ~ x, dat, rvlab = "My Outcome Var", cvlab = "My Columns") print(tab, rowpct = TRUE, colpct = FALSE) print.pctable(tab, rowpct = TRUE, colpct = TRUE) ## I also wanted an interface that would allow calls like ## pctable(y, x, dat) ## which I was able to do, but not when pctable is a method. ## As long as one writes in an existing variable, this dispatches ## pctable.default and result is OK pctable(dat$y, dat$x) pctable(dat$y, dat$x, rowpct = TRUE, colpct = FALSE) pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE) pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = NULL) tab <- pctable(dat$y, dat$x) print(tab, rowpct = TRUE, colpct = FALSE) print(tab, rowpct = TRUE, colpct = TRUE) pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c(NA, "E")) pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c("E")) ## Why do NA's get excluded pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c("B", "2")) ## This succeeds pctable.default(y, x, dat) ## Next causes error pctable(y, x, dat) ## Error in pctable(y, x, dat) (from #3) : object 'y' not found At one point yesterday, I was on the verge of comprehending the parse tree :) -- Paul E. Johnson Professor, Political Science Acting. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://crmda.ku.edu <http://quant.ku.edu> [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel