Dear all, I have partial data set with four colums. First column is "site" with three factors (i.e., A, B, and C). From second to fourth columns (v1 ~ v3) are my observations. In the observations of the data set, "." indicates missing value. I replaced "." with NA. To replace "." with NA, I used two steps. First, I replaced "." with NA, and then, changed each variable from factor to numeric using "df1$v1 <- as.numeric(df1$v1)". The second step was OK when I have low numbers of variables, however, it is painful when I have a lot of variables.
My question is: Is there any much more efficient way to convert this kind of large scale data? In short, I am looking for an alternative way of STEP 2. Or whole procedure if there is. Any comment will be highly appreciated. Thank you in advance!! Steve P.S.: Below is an example of what I did. STEP 1 > df1 site v1 v2 v3 1 A 10 5 . 2 A 22 54 . 3 A 44 214 2 4 A 521 14 4 5 A 5 73 1 6 A 1654 0.4 4 7 B 16 1 . 8 B . 4 5 9 B . . 4 10 B . 4 1 11 B 51 . 2 12 B 5 . . 13 C 1 0.4 . 14 C 0 4 . 15 C 1 1 4 16 C 40 . 7 17 C 4 . 7 18 C 10 . 1 > str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ... $ v1 : Factor w/ 13 levels ".","0","1","10",..: 4 7 10 13 11 6 5 1 1 1 ... $ v2 : Factor w/ 9 levels ".","0.4","1",..: 7 8 5 4 9 2 3 6 1 6 ... $ v3 : Factor w/ 6 levels ".","1","2","4",..: 1 1 3 4 2 4 1 5 4 2 ... > df1[df1=="."] <- "NA" Warning messages: 1: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") : invalid factor level, NAs generated 2: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") : invalid factor level, NAs generated 3: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") : invalid factor level, NAs generated > df1 site v1 v2 v3 1 A 10 5 <NA> 2 A 22 54 <NA> 3 A 44 214 2 4 A 521 14 4 5 A 5 73 1 6 A 1654 0.4 4 7 B 16 1 <NA> 8 B <NA> 4 5 9 B <NA> <NA> 4 10 B <NA> 4 1 11 B 51 <NA> 2 12 B 5 <NA> <NA> 13 C 1 0.4 <NA> 14 C 0 4 <NA> 15 C 1 1 4 16 C 40 <NA> 7 17 C 4 <NA> 7 18 C 10 <NA> 1 > str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ... $ v1 : Factor w/ 13 levels ".","0","1","10",..: 4 7 10 13 11 6 5 NA NA NA ... $ v2 : Factor w/ 9 levels ".","0.4","1",..: 7 8 5 4 9 2 3 6 NA 6 ... $ v3 : Factor w/ 6 levels ".","1","2","4",..: NA NA 3 4 2 4 NA 5 4 2 ... STEP 2. > df1$v1 <- as.numeric(df1$v1) > df1$v2 <- as.numeric(df1$v2) > df1$v3 <- as.numeric(df1$v3) > df1 site v1 v2 v3 1 A 4 7 NA 2 A 7 8 NA 3 A 10 5 3 4 A 13 4 4 5 A 11 9 2 6 A 6 2 4 7 B 5 3 NA 8 B NA 6 5 9 B NA NA 4 10 B NA 6 2 11 B 12 NA 3 12 B 11 NA NA 13 C 3 2 NA 14 C 2 6 NA 15 C 3 3 4 16 C 9 NA 6 17 C 8 NA 6 18 C 4 NA 2 > str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ... $ v1 : num 4 7 10 13 11 6 5 NA NA NA ... $ v2 : num 7 8 5 4 9 2 3 6 NA 6 ... $ v3 : num NA NA 3 4 2 4 NA 5 4 2 ... > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.