I did this: nb <- naiveBayes(users, platform) pl <- predict(nb,users) nrow(users) ==> 314781 ncol(users) ==> 109
1. naiveBayes() was quite fast (~20 seconds), while predict() was slow (tens of minutes). why? 2. the predict results were completely off the mark (quite the opposite of the expected overfitting). suffice it to show the tables: pl: android blackberry ipad iphone lg linux mac 3 5 11 14 312723 5 11 mobile nokia samsung symbian unknown windows 1864 17 16 112 0 0 platform: android blackberry ipad iphone lg linux mac 18013 1221 2647 1328 4 2936 34336 mobile nokia samsung symbian unknown windows 18 88 39 103 2660 251388 i.e., nb classified nearly everything as "lg" while in the actual data "lg" is virtually nonexistent. 3. when I print "nb", I see "A-priori probabilities" (which are what I expected) and "Conditional probabilities" which are confusing because there are only two of them, e.g.: android 0.048464998 0.43946764 blackberry 0.001638002 0.04045564 ipad 0.322251606 1.84940588 iphone 0.030873494 0.23250250 lg 0.000000000 0.00000000 linux 0.023501362 0.34698919 mac 0.082653774 1.22535027 mobile 0.000000000 0.00000000 nokia 0.000000000 0.00000000 samsung 0.000000000 0.00000000 symbian 0.000000000 0.00000000 unknown 0.003759398 0.08219078 windows 0.021158528 0.32916970 the predictors are integers. is the first column for the 0 predictors and the second for all non-0? Is there a way to ask naiveBayes to differenciate between non-0 values? thanks! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://ffii.org http://www.PetitionOnline.com/tap12009/ http://mideasttruth.com http://iris.org.il http://openvotingconsortium.org The program isn't debugged until the last user is dead. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.