On Mon, May 7, 2012 at 7:05 PM, Igor Filippov <[email protected]> wrote: > >> Here's the example output: >> >> >> *** Vote Results *** >> misclassified: 93/242 (%38.43) 93/242 (%38.43) >> > Why the same set of numbers is printed twice?
If you do the predictions with a confidence threshold the two numbers will be different. One will then be "accuracy relative to the predictions made" while the other is "accuracy relative to the whole data set" > >> average correct confidence: 0.8520 >> average incorrect confidence: 0.7673 >> >> Results Table: >> >> 72 61 | 68.57 >> 32 77 | 55.40 >> ------- ------- >> 69.23 55.80 >> > > If I try to compute percentages I'm getting for example > 72/(72+61) = 54.1% not 68.57% or any other percentage I see there? > However 72/(72+32) = 69.23% just as it should... looks like that's a bug. FYI: I've been spending some time recently looking at scikit-learn and have been quite impressed... there's a bit of writeup here: http://code.google.com/p/rdkit/wiki/WorkingWithSciKitLearn It's definitely worth taking a look at. -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

