Hello,

Please don't post in HTML, post in plain text like the posting guide asks for. Your data is unreadable.

Here are two test data sets, one with all columns numeric, the other with some columns numeric.


wbpractice1 <- mtcars  # all columns are numeric
wbpractice2 <- iris    # not all columns are numeric
wbpractice1[] <- lapply(wbpractice1, \(x){
  is.na(x) <- sample(length(x), 0.25*length(x))
  x
})
wbpractice2[-5] <- lapply(wbpractice2[-5], \(x){
  is.na(x) <- sample(length(x), 0.25*length(x))
  x
})


#---

If all columns are numeric just lapply an anonymous function to each of them replacing the values where is.na is TRUE by the mean.


wbpractice1[] <- lapply(wbpractice1, \(x){
  x[is.na(x)] <- mean(x, na.rm = TRUE)
  x
})


But if some columns are not numeric, determine which are first, then apply the same code to that subset.


num_cols <- sapply(wbpractice2, is.numeric)
wbpractice2[num_cols] <- lapply(wbpractice2[num_cols], \(x){
  x[is.na(x)] <- mean(x, na.rm = TRUE)
  x
})


And here are dplyr solutions.


library(dplyr)

wbpractice1 %>%
mutate(across(everything(), ~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x)))

wbpractice2 %>%
mutate(across(where(is.numeric), ~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x)))



Hope this helps,

Rui Barradas


Às 13:38 de 18/10/21, Admire Tarisirayi Chirume escreveu:
Good day colleagues. Below is a csv file attached which i am using in my
analysis.



household.id <http://hh.id>

hd17.perm

hd17employ

health.exp

total.food.exp

total.nfood.exp

1

2

yes

1654

23654

23655

2

2

yes

NA

NA

65984

3

6

no

2547

123311

52416

4

8

NA

2365

13648

12544

5

6

NA

1254

36549

12365

6

8

yes

1236

236541

26522

7

8

no

NA

13264

23698





So I created a df using the above and its a csv file as follows

wbpractice <- read.csv("world_practice.csv")

Now i am doing data cleaning and trying to replace all missing values with
the averages of the respective columns.

the dimension of the actual dataset is;

dim(wbpractice)
[1] 31998    6

I used the following script which i executed by i got some error messages

for(i in 1:ncol( wbpractice  )){
      wbpractice  [is.na( wbpractice  [,i]), i] <- mean( wbpractice  [,i],
na.rm = TRUE)
     }

Any help to replace all NAs with average values in my dataframe?






        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to