You can run simulations to find out how large N must be so that split sample validation yields sufficient precision to be trustworthy, in other words, that different random splits provide the same estimate of model accuracy to within some small tolerance. You will be surprised how large N must be for this to happen. Consider resampling instead. Frank
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Splitting-data-into-test-and-train-80-20-kepping-attributes-similar-tp4583928p4589554.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.