Am 06.04.2011 22:02, schrieb Walter Anderson:
I have cobbled together the following logic. It works but is very slow.
I'm sure that there must be a better r-specific way to implement this
kind of thing, but have been unable to find/understand one. Any help
would be appreciated.
hh.sub <- households[c("HOUSEID","HHFAMINC")]
for (indx in 1:length(hh.sub$HOUSEID)) {
if ((hh.sub$HHFAMINC[indx] == '01') | (hh.sub$HHFAMINC[indx] == '02') |
(hh.sub$HHFAMINC[indx] == '03') | (hh.sub$HHFAMINC[indx] == '04') |
(hh.sub$HHFAMINC[indx] == '05'))
hh.sub$CS_FAMINC[indx] <- 1 # Less than $25,000
if ((hh.sub$HHFAMINC[indx] == '06') | (hh.sub$HHFAMINC[indx] == '07') |
(hh.sub$HHFAMINC[indx] == '08') | (hh.sub$HHFAMINC[indx] == '09') |
(hh.sub$HHFAMINC[indx] == '10'))
hh.sub$CS_FAMINC[indx] <- 2 # $25,000 to $50,000
if ((hh.sub$HHFAMINC[indx] == '11') | (hh.sub$HHFAMINC[indx] == '12') |
(hh.sub$HHFAMINC[indx] == '13') | (hh.sub$HHFAMINC[indx] == '14') |
(hh.sub$HHFAMINC[indx] == '15'))
hh.sub$CS_FAMINC[indx] <- 3 # $50,000 to $75,000
if ((hh.sub$HHFAMINC[indx] == '16') | (hh.sub$HHFAMINC[indx] == '17'))
hh.sub$CS_FAMINC[indx] <- 4 # $75,000 to $100,000
if ((hh.sub$HHFAMINC[indx] == '18'))
hh.sub$CS_FAMINC[indx] <- 5 # More than $100,000
if ((hh.sub$HHFAMINC[indx] == '-7') | (hh.sub$HHFAMINC[indx] == '-8') |
(hh.sub$HHFAMINC[indx] == '-9'))
hh.sub$CS_FAMINC[indx] = 0
}
Hi,
the for-loop is entirely unnecessary. You can, as a first step, rewrite
the code like this:
if ((hh.sub$HHFAMINC == '01') | (hh.sub$HHFAMINC == '02') |
(hh.sub$HHFAMINC == '03') | (hh.sub$HHFAMINC == '04') |
(hh.sub$HHFAMINC == '05'))
hh.sub$CS_FAMINC <- 1 # Less than $25,000
This very basic concept is called "vectorization" in R. You should read
about it, it rocks.
In this case, though, you don't even need to do that:
If you cast the variable HHFAMINC into a number like this:
hh.sub$HHFAMINC <- as.numeric(hh.sub$HHFAMINC)
, then you can apply the cut() function to create a factor variable:
hh.sub$myawesomefactor <- cut(hh.sub$HHFAMINC, breaks=c(5.5, 10.5, 15.5,
17.5))
or something like that should do the trick. You will then have to rename
the factor values. I think it is the function names(), but I'm only 95%
sure (heh.)
Also, this might be my OCD speaking, but I would use NA instead of 0 for
non-available values.
Have fun,
Alex
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.