As some additional information, I re-ran the model across the range of n = 50 to 150 (n being the 'top n' parameters returned by chi.squared), and this time used a completed different subset of the data for both training and test. Nearly identical results, with the typical train AUC about 0.98 and the typical test AUC about 0.56. The other change I made: 30k records (instances) for training this time and 20k for test.
I'll check to see if the set of class labels I'm using (I'm currently only running one of the 3 sets) is the least balanced and if so, I'll grab the most balanced. However, none of the three sets is much better than 90/10 I don't think. -- View this message in context: http://r.789695.n4.nabble.com/Analyzing-Poor-Performance-Using-naiveBayes-tp4639825p4639985.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.