[Rd] Option "installWithVers" seems to impact new.packages() badly?
In Rdevel, SVN version built this morning around 10am central european time, it looks like install.packages(new.packages(),installWithVers=TRUE) seem to ignore the version information -- that is, it reinstalls current versions of packages. This did not happen before I used "installWithVers=TRUE" option, that is I could use update.packages() and install.packages(new.packages()) to keep a current, platform-complete, installed base of CRAN. best, -tony [EMAIL PROTECTED] Muttenz, Switzerland. "Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes" (AJR, 4Jan05). __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug in gsub with perl=TRUE (PR#8164)
Full_Name: Richard Mott Version: 2.0.1 OS: Linux toad 2.6.9 #4 SMP Mon Feb 21 16:20:16 GMT 2005 x86_64 AMD Opteron(tm) Processor 848 AuthenticAMD GNU/Linux Submission from: (NULL) (129.67.46.247) gsub with perl=TRUE does not work properly. It pads/truncates the resulting string to the length of the input string: my.formula <- "log10(Biochem.ALP)^2+1 ~ Family + GENDER" > gsub("^.+~", "transformed.y ~", my.formula ) [1] "transformed.y ~ Family + GENDER" > gsub("^.+~", "transformed.y ~", my.formula, perl=TRUE ) [1] "transformed.y ~ Family + GENDER\0\006\0\0\r\377\0\0\0" # padded my.formula <- "Biochem.ALP ~ Family + GENDER" > gsub("^.+~", "transformed.y ~", my.formula, perl=TRUE ) [1] "transformed.y ~ Family + GEND" # truncated > gsub("^.+~", "transformed.y ~", my.formula ) [1] "transformed.y ~ Family + GENDER" __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in gsub with perl=TRUE (PR#8164)
[EMAIL PROTECTED] wrote: > Full_Name: Richard Mott > Version: 2.0.1 This version is completely outdated. Please try with a *recent* version of R when reporting bugs, in this case R-2.2.0 beta (or in worst case R-2.1.1, the current release). The bug reported below has been fixed some months ago ... Uwe Ligges > OS: Linux toad 2.6.9 #4 SMP Mon Feb 21 16:20:16 GMT 2005 x86_64 AMD > Opteron(tm) Processor 848 AuthenticAMD GNU/Linux > Submission from: (NULL) (129.67.46.247) > > > gsub with perl=TRUE does not work properly. It pads/truncates the resulting > string to > the length of the input string: > > my.formula <- "log10(Biochem.ALP)^2+1 ~ Family + GENDER" > > >>gsub("^.+~", "transformed.y ~", my.formula ) > > [1] "transformed.y ~ Family + GENDER" > > >>gsub("^.+~", "transformed.y ~", my.formula, perl=TRUE ) > > [1] "transformed.y ~ Family + GENDER\0\006\0\0\r\377\0\0\0" # padded > > my.formula <- "Biochem.ALP ~ Family + GENDER" > >>gsub("^.+~", "transformed.y ~", my.formula, perl=TRUE ) > > [1] "transformed.y ~ Family + GEND" # truncated > >>gsub("^.+~", "transformed.y ~", my.formula ) > > [1] "transformed.y ~ Family + GENDER" > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Subscripting fails if name of element is "" (PR#8161)
Dear all, I resend this mail because it was blocked: I submitted a bug from the r-bug webpage and hypatia seems to block mail that is send from a different IP than that usually associated with the email. Looks like it is currently impossible to correctly submit bugs from the website. However, here is the original bug report: (PR#8161) Dear all, The following shows cases where accessing elements via their name fails (if the name is a string of length zero). Best regards Jens Oehlschlägel > p <- 1:3 > names(p) <- c("a","", as.character(NA)) > p a 123 > > for (i in names(p)) + print(p[[i]]) [1] 1 [1] 2 [1] 3 > > # error 1: vector subsripting with "" fails in second element > for (i in names(p)) + print(p[i]) a 1 NA 3 > > # error 2: print method for list shows no name for second element > p <- as.list(p) > > > for (i in names(p)) + print(p[[i]]) [1] 1 [1] 2 [1] 3 > > # error 3: list subsripting with "" fails in second element > for (i in names(p)) + print(p[i]) $a [1] 1 $"NA" NULL $"NA" [1] 3 > > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor1.1 year 2005 month06 day 20 language R # -- replication code -- p <- 1:3 names(p) <- c("a","", as.character(NA)) p for (i in names(p)) print(p[[i]]) # error 1: vector subsripting with "" fails in second element for (i in names(p)) print(p[i]) # error 2: print method for list shows no name for second element p <- as.list(p) for (i in names(p)) print(p[[i]]) # error 3: list subsripting with "" fails in second element for (i in names(p)) print(p[i]) -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Summary of translation status
On 9/28/2005 11:50 PM, Fernando Henrique Ferraz P. da Rosa wrote: > Dear R-devel & Translation Teams, > > In order to monitor the progress of the translation for the > pt_BR team I wrote a script to summarize the status of the translations. > It wasn't difficult to extend it to the other languages so I decided to > set up a page with the summaries of the translation for all languages > for which currently exist a translation. > > http://www.ime.usp.br/~feferraz/en/rtransstat.html > > If any of you find it useful I can keep it updated on a regular basis > (daily or weekly). > > > Thank you, > > > (PS: I'm resending this message because it didn't get through the > filter the first time. Sorry for the inconvenience for those that > are receiving it more than one time). Hi Fernando. That's a nice page. I'd add an explicit statement about which branch the statistics apply to. You say "Statistics based on SVN: 35706", presumably on the trunk, but soon interest will shift to the R-2-2-patches branch. (If this is automated and you have the disk space for both, perhaps both trunk and the current patch branch could be listed, but I expect the statistics will be very similar.) Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subscripting fails if name of element is "" (PR#8161)
On Fri, 30 Sep 2005, "Jens Oehlschlägel" wrote: Dear all, The following shows cases where accessing elements via their name fails (if the name is a string of length zero). This looks deliberate (there is a function NonNullStringMatch that does the matching). I assume this is because there is no other way to indicate that an element has no name. If so, it is a documentation bug -- help(names) and FAQ 7.14 should specify this behaviour. Too late for 2.2.0, unfortunately. -thomas Best regards Jens Oehlschlägel p <- 1:3 names(p) <- c("a","", as.character(NA)) p a 123 for (i in names(p)) + print(p[[i]]) [1] 1 [1] 2 [1] 3 # error 1: vector subsripting with "" fails in second element for (i in names(p)) + print(p[i]) a 1 NA 3 # error 2: print method for list shows no name for second element p <- as.list(p) for (i in names(p)) + print(p[[i]]) [1] 1 [1] 2 [1] 3 # error 3: list subsripting with "" fails in second element for (i in names(p)) + print(p[i]) $a [1] 1 $"NA" NULL $"NA" [1] 3 version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor1.1 year 2005 month06 day 20 language R # -- replication code -- p <- 1:3 names(p) <- c("a","", as.character(NA)) p for (i in names(p)) print(p[[i]]) # error 1: vector subsripting with "" fails in second element for (i in names(p)) print(p[i]) # error 2: print method for list shows no name for second element p <- as.list(p) for (i in names(p)) print(p[[i]]) # error 3: list subsripting with "" fails in second element for (i in names(p)) print(p[i]) -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Compiling R on OpenSolaris
Hi: I am trying to R (v2.1.1) to compile on OpenSolaris (build 22) using gcc version 3.4.3. But I am getting this error message: gmake[5]: Entering directory `/usr/local/R-2.1.1/src/library/tools/src' ../../../../library/tools/libs/tools.so is unchanged gmake[5]: Leaving directory `/usr/local/R-2.1.1/src/library/tools/src' gmake[4]: Leaving directory `/usr/local/R-2.1.1/src/library/tools/src' Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library '/usr/local/R-2.1.1 /library/tools/libs/tools.so': ld.so.1: R: fatal: relocation error: R_AMD64_PC32: file /usr/local/R-2.1.1/library/tools/libs/tools.so: symbol main: value 0x28001413f04 does not fit Execution halted gmake[3]: *** [all] Error 1 gmake[3]: Leaving directory `/usr/local/R-2.1.1/src/library/tools' I am on an Athlon64 box so I put in -m64 -mtune=athlon64 for the compiler option. Any help how I can fix this problem much appreciated. thanks ---Vincent [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] by() processing on a dataframe
I want to calculate a statistic on a number of subgroups of a dataframe, then put the results into a dataframe. (What SAS PROC MEANS does, I think, though it's been years since I used it.) This is possible using by(), but it seems cumbersome and fragile. Is there a more straightforward way than this? Here's a simple example showing my current strategy: > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, c(2,2,2,2)), value = rnorm(8)) > dataset gp1 gp2 value 1 1 1 0.9493232 2 1 1 -0.0474712 3 1 2 -0.6808021 4 1 2 1.9894999 5 2 3 2.0154786 6 2 3 0.4333056 7 2 4 -0.4746228 8 2 4 0.6017522 > > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1], + gp2 = subset$gp2[1], statistic = mean(subset$value)) > > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup) > > result <- do.call('rbind', bylist) > result gp1 gp2 statistic 11 1 0.45092598 11 1 2 0.65434890 12 2 3 1.22439210 13 2 4 0.06356469 tapply() is inappropriate because I don't have all possible combinations of gp1 and gp2 values, only some of them: > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean) 1 23 4 1 0.450926 0.6543489 NA NA 2 NANA 1.224392 0.06356469 In the real case, I only have a very sparse subset of all the combinations, and tapply() and by() both die for lack of memory. Any suggestions on how to do what I want, without using SAS? Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
I'm not entirely sure what you want, but maybe this does the trick? data.frame.by <- function(data, variables, fun, ...) { if (length(variables) == 0 ) { df <- data.frame(results = 0) df$results <- list(fun(data$value, ...)) return(df) } sorted <- sort.df(data, variables)[,c(variables), drop=FALSE] duplicates <- duplicated(sorted[,variables, drop=FALSE]) index <- cumsum(!duplicates) results <- by(data, index, fun, ...) cols <- sorted[!duplicates,variables, drop=FALSE] cols$results <- array(results) cols } sort.df <- function(data, vars) { data[do.call("order", data[,vars, drop=FALSE]), ,drop=FALSE] } dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, c(2,2,2,2)), value = rnorm(8)) data.frame.by(dataset, c("gp1", "gp2"), function(data) mean(data$value)) data.frame.by(dataset, "gp1", function(data) tapply(data$value, data$gp2, mean)) data.frame.by(dataset, "gp1", function(data) lm(gp2 ~ value, data)) # doesn't print, but everything is there ok (note that the results column will be a list if necessary - this may be a serious abuse of data frames, but I'm not sure and no one replied when I queried the list) Hadley __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
Duncan Murdoch <[EMAIL PROTECTED]> writes: > I want to calculate a statistic on a number of subgroups of a dataframe, > then put the results into a dataframe. (What SAS PROC MEANS does, I > think, though it's been years since I used it.) > > This is possible using by(), but it seems cumbersome and fragile. Is > there a more straightforward way than this? > > Here's a simple example showing my current strategy: > > > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, > c(2,2,2,2)), value = rnorm(8)) > > dataset >gp1 gp2 value > 1 1 1 0.9493232 > 2 1 1 -0.0474712 > 3 1 2 -0.6808021 > 4 1 2 1.9894999 > 5 2 3 2.0154786 > 6 2 3 0.4333056 > 7 2 4 -0.4746228 > 8 2 4 0.6017522 > > > > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1], > + gp2 = subset$gp2[1], statistic = mean(subset$value)) > > > > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup) > > > > result <- do.call('rbind', bylist) > > result > gp1 gp2 statistic > 11 1 0.45092598 > 11 1 2 0.65434890 > 12 2 3 1.22439210 > 13 2 4 0.06356469 > > tapply() is inappropriate because I don't have all possible combinations > of gp1 and gp2 values, only some of them: > > > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean) > 1 23 4 > 1 0.450926 0.6543489 NA NA > 2 NANA 1.224392 0.06356469 > > > > In the real case, I only have a very sparse subset of all the > combinations, and tapply() and by() both die for lack of memory. > > Any suggestions on how to do what I want, without using SAS? Have you tried aggregate()? Alternatively, you migth split on interaction(, drop=TRUE) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
Check out summaryBy in the doBy package at: http://genetics.agrsci.dk/~sorenh/misc e.g. summaryBy(value ~ gp1 + gp2, data = dataset) On 9/30/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > I want to calculate a statistic on a number of subgroups of a dataframe, > then put the results into a dataframe. (What SAS PROC MEANS does, I > think, though it's been years since I used it.) > > This is possible using by(), but it seems cumbersome and fragile. Is > there a more straightforward way than this? > > Here's a simple example showing my current strategy: > > > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, > c(2,2,2,2)), value = rnorm(8)) > > dataset > gp1 gp2 value > 1 1 1 0.9493232 > 2 1 1 -0.0474712 > 3 1 2 -0.6808021 > 4 1 2 1.9894999 > 5 2 3 2.0154786 > 6 2 3 0.4333056 > 7 2 4 -0.4746228 > 8 2 4 0.6017522 > > > > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1], > + gp2 = subset$gp2[1], statistic = mean(subset$value)) > > > > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup) > > > > result <- do.call('rbind', bylist) > > result >gp1 gp2 statistic > 11 1 0.45092598 > 11 1 2 0.65434890 > 12 2 3 1.22439210 > 13 2 4 0.06356469 > > tapply() is inappropriate because I don't have all possible combinations > of gp1 and gp2 values, only some of them: > > > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean) > 1 23 4 > 1 0.450926 0.6543489 NA NA > 2 NANA 1.224392 0.06356469 > > > > In the real case, I only have a very sparse subset of all the > combinations, and tapply() and by() both die for lack of memory. > > Any suggestions on how to do what I want, without using SAS? > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
On Fri, 2005-09-30 at 13:22 -0400, Duncan Murdoch wrote: > I want to calculate a statistic on a number of subgroups of a dataframe, > then put the results into a dataframe. (What SAS PROC MEANS does, I > think, though it's been years since I used it.) > > This is possible using by(), but it seems cumbersome and fragile. Is > there a more straightforward way than this? > > Here's a simple example showing my current strategy: > > > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, > c(2,2,2,2)), value = rnorm(8)) > > dataset >gp1 gp2 value > 1 1 1 0.9493232 > 2 1 1 -0.0474712 > 3 1 2 -0.6808021 > 4 1 2 1.9894999 > 5 2 3 2.0154786 > 6 2 3 0.4333056 > 7 2 4 -0.4746228 > 8 2 4 0.6017522 > > > > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1], > + gp2 = subset$gp2[1], statistic = mean(subset$value)) > > > > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup) > > > > result <- do.call('rbind', bylist) > > result > gp1 gp2 statistic > 11 1 0.45092598 > 11 1 2 0.65434890 > 12 2 3 1.22439210 > 13 2 4 0.06356469 > > tapply() is inappropriate because I don't have all possible combinations > of gp1 and gp2 values, only some of them: > > > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean) > 1 23 4 > 1 0.450926 0.6543489 NA NA > 2 NANA 1.224392 0.06356469 > > > > In the real case, I only have a very sparse subset of all the > combinations, and tapply() and by() both die for lack of memory. > > Any suggestions on how to do what I want, without using SAS? > > Duncan Murdoch Duncan, Does this do what you want? > set.seed(1) > df <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, c(2,2,2,2)), value = rnorm(8)) > df gp1 gp2 value 1 1 1 -0.6264538 2 1 1 0.1836433 3 1 2 -0.8356286 4 1 2 1.5952808 5 2 3 0.3295078 6 2 3 -0.8204684 7 2 4 0.4874291 8 2 4 0.7383247 > means <- aggregate(df$value, list(gp1 = df$gp1, gp2 = df$gp2), mean) > means gp1 gp2 x 1 1 1 -0.2214052 2 1 2 0.3798261 3 2 3 -0.2454803 4 2 4 0.6128769 > merge(df, means, by = c("gp1", "gp2")) gp1 gp2 value x 1 1 1 -0.6264538 -0.2214052 2 1 1 0.1836433 -0.2214052 3 1 2 -0.8356286 0.3798261 4 1 2 1.5952808 0.3798261 5 2 3 0.3295078 -0.2454803 6 2 3 -0.8204684 -0.2454803 7 2 4 0.4874291 0.6128769 8 2 4 0.7383247 0.6128769 HTH, Marc Schwartz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
On 9/30/2005 1:41 PM, Peter Dalgaard wrote: > Duncan Murdoch <[EMAIL PROTECTED]> writes: > >> I want to calculate a statistic on a number of subgroups of a dataframe, >> then put the results into a dataframe. (What SAS PROC MEANS does, I >> think, though it's been years since I used it.) >> >> This is possible using by(), but it seems cumbersome and fragile. Is >> there a more straightforward way than this? >> >> Here's a simple example showing my current strategy: >> >> > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, >> c(2,2,2,2)), value = rnorm(8)) >> > dataset >>gp1 gp2 value >> 1 1 1 0.9493232 >> 2 1 1 -0.0474712 >> 3 1 2 -0.6808021 >> 4 1 2 1.9894999 >> 5 2 3 2.0154786 >> 6 2 3 0.4333056 >> 7 2 4 -0.4746228 >> 8 2 4 0.6017522 >> > >> > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1], >> + gp2 = subset$gp2[1], statistic = mean(subset$value)) >> > >> > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup) >> > >> > result <- do.call('rbind', bylist) >> > result >> gp1 gp2 statistic >> 11 1 0.45092598 >> 11 1 2 0.65434890 >> 12 2 3 1.22439210 >> 13 2 4 0.06356469 >> >> tapply() is inappropriate because I don't have all possible combinations >> of gp1 and gp2 values, only some of them: >> >> > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean) >> 1 23 4 >> 1 0.450926 0.6543489 NA NA >> 2 NANA 1.224392 0.06356469 >> >> >> >> In the real case, I only have a very sparse subset of all the >> combinations, and tapply() and by() both die for lack of memory. >> >> Any suggestions on how to do what I want, without using SAS? > > Have you tried aggregate()? aggregate() has a few problems: - it applies the function to every column in the dataframe. In my case it only makes sense to apply it to some of them. (This may not be a killer, but it certainly makes things inefficient and tricky.) - I'd like to look at the whole subset to figure out the function (but I can probably work around this) - It uses too much memory. E.g. try > df <- data.frame(x=rnorm(1000), y=rnorm(1000), z=rnorm(1000), w=rnorm(1000)) > aggregate(df, list(df$x,df$y,df$z), mean) Error: cannot allocate vector of size 3906250 Kb In addition: Warning messages: 1: Reached total allocation of 1007Mb: see help(memory.size) 2: Reached total allocation of 1007Mb: see help(memory.size) This should have returned the same dataframe (there are 1000 subsets), but it tried to construct a billion of them. On 9/30/2005 1:48 PM, Don MacQueen wrote: > Look at the summarize() function in the Hmisc package. It seems to want a matrix, not a data.frame. The real situation has mixed types (character, factors, numeric) so it can't be a matrix. > (and I this is an r-help question, not an r-devel question, I would think) Yes, that's where I should have posted. Sorry. However, this is starting to look like a development problem... Peter again: > Alternatively, you migth split on interaction(, drop=TRUE) Looking at the code, it appears that will construct the full product interaction, then subset to the non-empty cases... Yes, it does that. Looks like I'll have to write my own. Duncan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
On 9/30/2005 1:41 PM, hadley wickham wrote: > I'm not entirely sure what you want, but maybe this does the trick? > > data.frame.by <- function(data, variables, fun, ...) { > if (length(variables) == 0 ) { > df <- data.frame(results = 0) > df$results <- list(fun(data$value, ...)) > return(df) > } > > sorted <- sort.df(data, variables)[,c(variables), drop=FALSE] > duplicates <- duplicated(sorted[,variables, drop=FALSE]) > index <- cumsum(!duplicates) > > results <- by(data, index, fun, ...) > > cols <- sorted[!duplicates,variables, drop=FALSE] > cols$results <- array(results) > cols > } > > > sort.df <- function(data, vars) { > data[do.call("order", data[,vars, drop=FALSE]), ,drop=FALSE] > } > > > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, > c(2,2,2,2)), value = rnorm(8)) > > data.frame.by(dataset, c("gp1", "gp2"), function(data) mean(data$value)) > data.frame.by(dataset, "gp1", function(data) tapply(data$value, data$gp2, > mean)) > data.frame.by(dataset, "gp1", function(data) lm(gp2 ~ value, data)) # > doesn't print, but everything is there ok > > (note that the results column will be a list if necessary - this may > be a serious abuse of data frames, but I'm not sure and no one replied > when I queried the list) I think this should work. Thanks! Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] by() processing on a dataframe
And here is one more approach using the reshape package: library(reshape) dataset.d <- melt(dataset, id = 1:2) cast(dataset.d, gp1 + gp2 ~ variable, mean) On 9/30/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Check out summaryBy in the doBy package at: > > http://genetics.agrsci.dk/~sorenh/misc > > e.g. > > summaryBy(value ~ gp1 + gp2, data = dataset) > > > > On 9/30/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > > I want to calculate a statistic on a number of subgroups of a dataframe, > > then put the results into a dataframe. (What SAS PROC MEANS does, I > > think, though it's been years since I used it.) > > > > This is possible using by(), but it seems cumbersome and fragile. Is > > there a more straightforward way than this? > > > > Here's a simple example showing my current strategy: > > > > > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4, > > c(2,2,2,2)), value = rnorm(8)) > > > dataset > > gp1 gp2 value > > 1 1 1 0.9493232 > > 2 1 1 -0.0474712 > > 3 1 2 -0.6808021 > > 4 1 2 1.9894999 > > 5 2 3 2.0154786 > > 6 2 3 0.4333056 > > 7 2 4 -0.4746228 > > 8 2 4 0.6017522 > > > > > > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1], > > + gp2 = subset$gp2[1], statistic = mean(subset$value)) > > > > > > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup) > > > > > > result <- do.call('rbind', bylist) > > > result > >gp1 gp2 statistic > > 11 1 0.45092598 > > 11 1 2 0.65434890 > > 12 2 3 1.22439210 > > 13 2 4 0.06356469 > > > > tapply() is inappropriate because I don't have all possible combinations > > of gp1 and gp2 values, only some of them: > > > > > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean) > > 1 23 4 > > 1 0.450926 0.6543489 NA NA > > 2 NANA 1.224392 0.06356469 > > > > > > > > In the real case, I only have a very sparse subset of all the > > combinations, and tapply() and by() both die for lack of memory. > > > > Any suggestions on how to do what I want, without using SAS? > > > > Duncan Murdoch > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Summary of translation status
Duncan Murdoch writes: > > Hi Fernando. That's a nice page. I'd add an explicit statement about > which branch the statistics apply to. You say "Statistics based on SVN: > 35706", presumably on the trunk, but soon interest will shift to the > R-2-2-patches branch. (If this is automated and you have the disk space > for both, perhaps both trunk and the current patch branch could be > listed, but I expect the statistics will be very similar.) > > Duncan Murdoch Hum that's true. I'm using the trunk branch. The process is somewhat automated, but I'd have to keep a working copy of the patch branch in order to run the status on it. Anyways I also think the statistics will probably be the same. Everytime I submit a translation I see it appearing on trunk so I think it won't matter much. -- Fernando Henrique Ferraz P. da Rosa http://www.feferraz.net __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel