An essentially identical approach that may be a tad clearer -- but requires additional space -- first creates a logical vector for the locations of the NA's in the unlisted data.frame. Further NA positions are randomly added and then the augmented vector is used as a logical matrix to index where the NA's should go in the data frame:
df <- data.frame(a = c(1:3,NA,4:6), b=c(letters[1:6],NA), c= c(1,NA,runif(5))) nr <- nrow(df); nc <- ncol(df) p <- .3 ## desired total proportion of NA's ina <- is.na(unlist(df)) ## logical vector, TRUE corresponds to NA positions n2 <- floor(p*nr*nc) - sum(ina) ## number of new NA's ina[sample(which(!is.na(ina)), n2)] <- TRUE df[matrix(ina, nr=nr,nc=nc)]<- NA ## using matrix indexing df Cheers, Bert On Fri, Nov 29, 2013 at 10:09 AM, arun <smartpink...@yahoo.com> wrote: > Hi, > I used that because 10% of the values in the data were already NA. > > > You are right. Sorry, ?match() is unnecessary. I was trying another > solution with match() which didn't work out and forgot to check whether it > was adequate or not. > set.seed(49) > dat1[!is.na(dat1)][sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20))] > <- NA > A.K. > > > Thanks for the reply. I don't get the 0.20 multiplied by the length of the > non NA value, where did you take it from? > > Furthermore, why do we have to use the function match? Wouldn't it be enough > to use the saple function? > > > On Thursday, November 28, 2013 12:57 PM, arun <smartpink...@yahoo.com> wrote: > Hi, > One way would be: > set.seed(42) > dat1 <- > as.data.frame(matrix(sample(c(1:5,NA),50,replace=TRUE,prob=c(10,15,15,20,30,10)),ncol=5)) > set.seed(49) > dat1[!is.na(dat1)][ match( > sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20)),seq(dat1[!is.na(dat1)]))] > <- NA > length(dat1[is.na(dat1)])/length(unlist(dat1)) > #[1] 0.28 > > A.K. > > > Hello, I'm quite new at R so I don't know which is the most efficient > way to execute a function that I could write easily in other languages. > > This is my problem: I have a dataframe with a certain numbers of > NA (approximately 10%). I want to add other NA values in random > positions of the dataframes until reaching an overall proportions of NA > values of 30% (clearly the positions with NA values don't have to > change). I tried looking at iterative function in R as apply or sapply > but I can't actually figure out how to use them in this case. Thank you. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.