I have two questions for the group. One is very concrete, and is dangerously close to a "please do my homework" posting. The second follows from the first one but is more general. I would welcome the advice of experienced R users.
As for the first one: I have a data frame with two variables X Y A, chris D, chris B, chris B, chris C, andrew E, andrew C, andrew B, beth D, chris D, beth C, beth D, beth D, beth A, andrew A, andrew A, andrew C, chris B, beth D, chris E, andrew D, chris D, beth D, chris A, andrew A, chris C chris A chris B chris C beth A chris I would like to produce a table, with one row for every level of the factor X, and multiple columns, filled with the observed levels of the factor Y that are observed jointly with X. Hence: X Z1 Z2 Z3 A, andrew, chris B, chris beth, chris C, andrew, beth, chris D, chris, beth E, andrew A solution would be to something like temp = tapply(Y, X, function(a) levels(a[,drop=TRUE]))) and then putting the output in an appropriately sized data frame. The issue I have with this is that it is inelegant and rather slow for my typical data set (~200K rows). So I was wondering if a more efficient, nicer solution exists. This leads me to a second question. Maybe out of laziness, maybe because R is good enough, I tend to do all my local data manipulations in R. This includes de-duping records, joining tables, and grouping observations. I do this also for larger data sets (say, dense tables with 100M+ elements). Is this current practice among R users? If so, is there a tutorial, or an R view on it? If not, what do you use? Thanks in advance, -gappy [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.