Hi Harold: Below works on your data set but check it a lot because I am
a little worried that
I could have missed something. Hopefully someone can send a a little
clearer way.
dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
c('foo', 'foo', 'foo', 'foobar', 'foo'))
print(dat)
temp <- lapply(split(dat,dat$id), function(.df) {
data.frame(id=.df$id[1],freq=nrow(.df),var1=all(.df$var1 %in%
.df$var1[1]),var2=all(.df$var2 %in% .df$var2[1]))
})
result <- do.call(rbind,temp)
print(result)
On Tue, Jan 13, 2009 at 2:17 PM, Doran, Harold wrote:
Suppose I have a dataframe as follows:
dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
c('foo', 'foo', 'foo', 'foobar', 'foo'))
Now, if I were to subset by id, such as:
subset(dat, id==1)
id var1 var2
1 1 10 foo
2 1 10 foo
I can see that the elements in var1 are exactly the same and the
elements in var2 are exactly the same. However,
subset(dat, id==2)
id var1 var2
3 2 20 foo
4 2 20 foobar
5 2 25 foo
Shows the elements are not the same for either variable in this
instance. So, what I am looking to create is a data frame that would
be
like this
id freq var1 var2
1 2 TRUE TRUE
2 3 FALSE FALSE
Where freq is the number of times the ID is repeated in the dataframe.
A
TRUE appears in the cell if all elements in the column are the same
for
the ID and FALSE otherwise. It is insignificant which values differ
for
my problem.
The way I am thinking about tackling this is to loop through the ID
variable and compare the values in the various columns of the
dataframe.
The problem I am encountering is that I don't think all.equal or
identical are the right functions in this case.
So, say I was wanting to compare the elements of var1 for id ==1. I
would have
x <- c(10,10)
Of course, the following works
all.equal(x[1], x[2])
[1] TRUE
As would a similar call to identical. However, what if I only have a
vector of values (or if the column consists of names) that I want to
assess for equality when I am trying to automate a process over
thousands of cases? As in the example above, the vector may contain
only
two values or it may contain many more. The number of values in the
vector differ by id.
Any thoughts?
Harold
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.