Per your suggestion I ran chi.squared() against my training data and to my delight, found just 50 parameters that were non-zero influencers. I built the model through several iterations and found n = 12 to be the optimum for the training data.
However, results still no so good for the test data. Here are he results for both with the AUC values for n = 3 to 50, training data in the 0.97 range, test data in the 0.55 area. http://r.789695.n4.nabble.com/file/n4639964/Feature_Selection_02.jpg If the training and test data sets were not so indistinguishable, I'd assume something weird about the test data--but I can't tell the two apart using any descriptive, 'meta' statistics I've tried so far. Having double-checked for dumb errors and having still obtained the same results, I toasted everything and started from scratch--still the same performance on the test data. Maybe I take a break and reflect for 30 min. -- View this message in context: http://r.789695.n4.nabble.com/Analyzing-Poor-Performance-Using-naiveBayes-tp4639825p4639964.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.