I have a data set covering a large number of cities with values for 
characteristics such as land area, population, and employment. The problem I 
have is that some cities lack observations for some of the characteristics and 
I'd like a quick way to determine which cities have missing data.  For example:

city<-c("A","A","A","B","B","C") 
var<-c("sqmi","pop","emp","pop","emp","pop")
value<-c(10,100,40,30,10,20)
df<-data.frame(city,var,value)

In this data frame, city A has complete data for the three variables, while 
city B is missing land area, and city C only has population data. In the full 
data frame, my approach to finding the missing observations has been to create 
a data frame with all combinations of 'city' and 'var', merge this onto the 
original data frame, and then extract the observations with missing data for 
'value':

city_unq<-c("A","B","C")
var_unq<-c("sqmi","pop","emp")
comb<-expand.grid(city=city_unq,var=var_unq)

mrg<-merge(comb,df,by=c("city","var"),all=T)
missing<-mrg[is.na(mrg$value),]

This works, but on a large dataset it gets slow and I'm looking for a a more 
efficient way to achieve this same result.  Any suggestions would be much 
appreciated.

Cheers
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to