You should figure out how many samples you want for Y=1 and 0, then sample from the relevant subset dfrm[dfrm$Y==1] by sampling row.names(dfrm[dfrm$Y==1] using replace=FALSE ?sample
On Mon, Oct 31, 2011 at 8:18 PM, Comcast <dwinsem...@comcast.net> wrote: > > > On Oct 31, 2011, at 1:54 PM, loubna ibn majdoub hassani <loubn...@gmail.com> > wrote: > >> Hir >> I have an umbalanced data set where I want to predict a binary variable Y. >> I want to do an under sampling by keeping all the 1 and taking just some of >> the 0 such as I'll have 90% of 0 and 10% of 1. > > ou haven' t given much detail , buteo thing like this will take all of the > 1's and 10% of the 0's > > dfrm[c(rownames(dfrm[dorm$Y==1,]), sample(rownames(dfrm[dfrm$Y==0]), 0.10)) , > ] >> Can u help me do that >> Thank u >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.