Per your suggestion I ran chi.squared() against my training data and to my
delight, found just 50 parameters that were non-zero influencers. I built
the model through several iterations and found n = 12 to be the optimum for
the training data.

However, results still no so good for the test data. Here are he results for
both with the AUC values for n = 3 to 50, training data in the 0.97 range,
test data in the 0.55 area.

http://r.789695.n4.nabble.com/file/n4639964/Feature_Selection_02.jpg 

If the training and test data sets were not so indistinguishable, I'd assume
something weird about the test data--but I can't tell the two apart using
any descriptive, 'meta' statistics I've tried so far. Having double-checked
for dumb errors and having still obtained the same results, I toasted
everything and started from scratch--still the same performance on the test
data.

Maybe I take a break and reflect for 30 min.



--
View this message in context: 
http://r.789695.n4.nabble.com/Analyzing-Poor-Performance-Using-naiveBayes-tp4639825p4639964.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to