Hello, You could build your output dataframe along the following lines:
foo <- function(x) length( unique(x) ) == 1 results <- data.frame( freq = tapply( dat$id, dat$id, length ), var1 = tapply( dat$var1, dat$id, foo ), var2 = tapply( dat$var2, dat$id, foo ) ) Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2009-01-13 at 14:17 -0500, Doran, Harold wrote: > Suppose I have a dataframe as follows: > > dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 = > c('foo', 'foo', 'foo', 'foobar', 'foo')) > > Now, if I were to subset by id, such as: > > > subset(dat, id==1) > id var1 var2 > 1 1 10 foo > 2 1 10 foo > > I can see that the elements in var1 are exactly the same and the > elements in var2 are exactly the same. However, > > > subset(dat, id==2) > id var1 var2 > 3 2 20 foo > 4 2 20 foobar > 5 2 25 foo > > Shows the elements are not the same for either variable in this > instance. So, what I am looking to create is a data frame that would be > like this > > id freq var1 var2 > 1 2 TRUE TRUE > 2 3 FALSE FALSE > > Where freq is the number of times the ID is repeated in the dataframe. A > TRUE appears in the cell if all elements in the column are the same for > the ID and FALSE otherwise. It is insignificant which values differ for > my problem. > > The way I am thinking about tackling this is to loop through the ID > variable and compare the values in the various columns of the dataframe. > The problem I am encountering is that I don't think all.equal or > identical are the right functions in this case. > > So, say I was wanting to compare the elements of var1 for id ==1. I > would have > > x <- c(10,10) > > Of course, the following works > > > all.equal(x[1], x[2]) > [1] TRUE > > As would a similar call to identical. However, what if I only have a > vector of values (or if the column consists of names) that I want to > assess for equality when I am trying to automate a process over > thousands of cases? As in the example above, the vector may contain only > two values or it may contain many more. The number of values in the > vector differ by id. > > Any thoughts? > > Harold > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.