On Wed, May 2, 2012 at 2:05 PM, Igor Filippov <[email protected]> wrote:
> Dear Colleagues,
>
> I am following the tutorials at
> http://code.google.com/p/rdkit/wiki/BuildingModelsUsingDescriptors1
> and
> http://code.google.com/p/rdkit/wiki/BuildingModelsUsingFingerprints1
> to use RDKit to build a random forest model with floating point type
> descriptors. Perhaps someone can advise me on the following two points:
>
> 1) There doesn't seem to be a real-valued analogue to SigTreeBuilder,
> e.g. QuantTreeBuilder, what we have to use is QuantTreeBoot?
Correct. The naming is not consistent.
> 2) The parameter needsQuantization is set to False in both cases -
> binary fingerprint or real-valued descriptor, naively I thought it
> should be True in the latter case?
needsQuantization is a historical artifact for dealing with
descriptors where *you* provide the quantization bounds. You probably
don't want to do this.
> 3) What precisely is the print out coming out of
> ScreenComposite.ShowVoteResults? I'm guessing it's the confusion matrix
> with something extra thrown in but I cannot find explanation of all the
> different numbers there...
Here's the example output:
*** Vote Results ***
misclassified: 93/242 (%38.43) 93/242 (%38.43)
average correct confidence: 0.8520
average incorrect confidence: 0.7673
Results Table:
72 61 | 68.57
32 77 | 55.40
------- -------
69.23 55.80
Here's what it means:
93 of 242 examples were misclassified. The correctly classified
examples had an average confidence of 0.85, the incorrectly classified
examples had a confidence of 0.77. (Confidence = the fraction of trees
voting for the predicted result)
The "Results Table" is a confusion matrix with summary statistics
presented at the end of each row/column. So 68.6% of the experimental
0s were correctly predicted in the above example while 55.4% of the
experimental 1s were correctly predicted.
Does that help?
-greg
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss