Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Jessica Streicher
Ad_1 <- subset(Attrition_data_1,Attrition_ind=="1") Ad_0 <- subset(Attrition_data_1,Attrition_ind=="0") s1<-sample(1:dim(Ad_0)[1],0.8*dim(Ad_0)[1])# 80% of the non-attrites s2<-sample(1:dim(Ad_1)[1],0.8*dim(Ad_1)[1])# 80% of attritees s3<- Ad_0 [-s1,] summary(s3) s4<- Ad_1 [-s2,] summary(s4)

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Dwaipayan Dasgupta
Behalf Of Jessica Streicher Sent: Thursday, April 26, 2012 5:07 PM To: r-help@r-project.org Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Be reminded that s1 and s2 are only the indexes on AD_0 and AD_1 of the data which you want to keep. therefore traindata

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Jessica Streicher
> Would you help please > > Thanks, > Dwaipayan > > From: Jessica Streicher [mailto:j.streic...@micromata.de] > Sent: Wednesday, April 25, 2012 9:25 PM > To: Dwaipayan Dasgupta > Cc: r-help@r-project.org > Subject: Re: [R] Splitting data into test and train (80:20) keppi

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Frank Harrell
You can run simulations to find out how large N must be so that split sample validation yields sufficient precision to be trustworthy, in other words, that different random splits provide the same estimate of model accuracy to within some small tolerance. You will be surprised how large N must be

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Dwaipayan Dasgupta
M To: Dwaipayan Dasgupta Cc: r-help@r-project.org Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Don't know whats wrong there (except if you're using the eclipse R plugin on a mac like me and the window for choosing the download site doesn't pop

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Jessica Streicher
"sample.split" >> Could you please help >> >> Thanks in advance >> doy >> >> >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >> Behalf Of Dwaipayan Dasgupta >> Sent: Tuesd

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Dwaipayan Dasgupta
roject.org Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Well, it throws an error, because there is no such function in default R. A bit of googling showed it might be the one in the caTools package. execute this: install.packages("caTools") libr

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Jessica Streicher
ample.split" > Could you please help > > Thanks in advance > doy > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Dwaipayan Dasgupta > Sent: Tuesday, April 24, 2012 9:08 PM > To: r-h

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Dwaipayan Dasgupta
lease help Thanks in advance doy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dwaipayan Dasgupta Sent: Tuesday, April 24, 2012 9:08 PM To: r-help@r-project.org Subject: [R] Splitting data into test and train (80:20) kepping

[R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-24 Thread Dwaipayan Dasgupta
Hi, I am trying to do some predictive modeling around attrition and want to split the dataset into test and train (80:20) and keep the ratio of attritees:non attrites same. In my dataset the attrition indicator is coded as 0(for non-attritees) and 1 (for attritees) and I want to keep the ratio o