You can run simulations to find out how large N must be so that split sample
validation yields sufficient precision to be trustworthy, in other words,
that different random splits provide the same estimate of model accuracy to
within some small tolerance.  You will be surprised how large N must be for
this to happen.  Consider resampling instead.
Frank
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Splitting-data-into-test-and-train-80-20-kepping-attributes-similar-tp4583928p4589554.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to