Hello,
Please don't post in HTML, post in plain text like the posting guide
asks for. Your data is unreadable.
Here are two test data sets, one with all columns numeric, the other
with some columns numeric.
wbpractice1 <- mtcars # all columns are numeric
wbpractice2 <- iris # not all columns are numeric
wbpractice1[] <- lapply(wbpractice1, \(x){
is.na(x) <- sample(length(x), 0.25*length(x))
x
})
wbpractice2[-5] <- lapply(wbpractice2[-5], \(x){
is.na(x) <- sample(length(x), 0.25*length(x))
x
})
#---
If all columns are numeric just lapply an anonymous function to each of
them replacing the values where is.na is TRUE by the mean.
wbpractice1[] <- lapply(wbpractice1, \(x){
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
})
But if some columns are not numeric, determine which are first, then
apply the same code to that subset.
num_cols <- sapply(wbpractice2, is.numeric)
wbpractice2[num_cols] <- lapply(wbpractice2[num_cols], \(x){
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
})
And here are dplyr solutions.
library(dplyr)
wbpractice1 %>%
mutate(across(everything(), ~ifelse(is.na(.x), mean(.x, na.rm =
TRUE), .x)))
wbpractice2 %>%
mutate(across(where(is.numeric), ~ifelse(is.na(.x), mean(.x, na.rm =
TRUE), .x)))
Hope this helps,
Rui Barradas
Às 13:38 de 18/10/21, Admire Tarisirayi Chirume escreveu:
Good day colleagues. Below is a csv file attached which i am using in my
analysis.
household.id <http://hh.id>
hd17.perm
hd17employ
health.exp
total.food.exp
total.nfood.exp
1
2
yes
1654
23654
23655
2
2
yes
NA
NA
65984
3
6
no
2547
123311
52416
4
8
NA
2365
13648
12544
5
6
NA
1254
36549
12365
6
8
yes
1236
236541
26522
7
8
no
NA
13264
23698
So I created a df using the above and its a csv file as follows
wbpractice <- read.csv("world_practice.csv")
Now i am doing data cleaning and trying to replace all missing values with
the averages of the respective columns.
the dimension of the actual dataset is;
dim(wbpractice)
[1] 31998 6
I used the following script which i executed by i got some error messages
for(i in 1:ncol( wbpractice )){
wbpractice [is.na( wbpractice [,i]), i] <- mean( wbpractice [,i],
na.rm = TRUE)
}
Any help to replace all NAs with average values in my dataframe?
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.