Re: [R] How to validate model?

Pedro.Rodriguez Tue, 07 Oct 2008 17:16:15 -0700

Hi,

Yes, from my humble opinion, it doesnt make any sense to use the (2-class) ROC 
curve for a rating system. For example, if the classifier predicts 100% for all 
the defaulted exposures and 0% for the good clients, then even though we have a 
perfect classifier we have a bad rating system.


However, if we use the multi-class version of Hand and Till (2001), we may test 
how good is the model to discriminate between classes or ratings. 

Hand, David J. and Robert J. Till, "A Simple Generalisation of the Area Under 
the ROC Curve for Multiple Class Classification Problems", Machine Learning, 
Vol. 45, No. 2, (November 2001), pp. 171-186.

Regards,

Pedro 


-----Original Message-----
From: Ajay ohri [mailto:[EMAIL PROTECTED]
Sent: Tue 10/7/2008 6:46 PM
To: Frank E Harrell Jr
Cc: Rodriguez, Pedro; r-help@r-project.org
Subject: Re: [R] How to validate model?
 
the purpose of validating indirect measures such as ROC curves.

Biggest Purpose- It is useful while in more marketing /sales meeting context ;)

Also , Deciles specific performance is easy to explain and monitor for faster 
execution/re modeling.

Regards,

Ajay


On Wed, Oct 8, 2008 at 4:01 AM, Frank E Harrell Jr <[EMAIL PROTECTED]> wrote:


        Ajay ohri wrote:
        

                This is an approach
                
                Run the model variables on hold out sample.
                
                Check and compare ROC curves between build and validation 
datasets.
                
                Check for changes in parameter estimates (co efficients of 
variables) p value and signs.
                
                Check for binning (response versus deciles of individual 
variables).
                
                Check concordance, and KS Statistic.
                A decile wise performance of the model in terms of predicted 
versus actual, rank ordering of deciles, helps in explaining the model to 
business audience who generally have some business specific input that may 
require scoring model to be tweaked.
                
                This assumes multicollinearity, outliers and missing value 
treatment have already been done, and holdout sample checks for overfitting. 
You can always rebuild the model using a different random holdout sample.
                
                A stable model would not change too much.
                
                In actual implementation , try and build real time triggers for 
deviations (%) between predicted and actual.
                
                Regards,
                
                Ajay
                


        I wouldn't recommend that approach but legitimate differences of 
opinion exist on the subject.  In particular I fail to see the purpose of 
validating indirect measures such as ROC curves.
        
        Frank
        
        


                www.decisionstats.com <http://www.decisionstats.com>
                
                On Wed, Oct 8, 2008 at 1:33 AM, Frank E Harrell Jr <[EMAIL 
PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:


                   [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> wrote:
                
                       Hi Frank,
                
                       Thanks for your feedback! But I think we are talking 
about two
                       different
                       things.
                
                       1) Validation: The generalization performance of the 
classifier.
                       See,
                       for example, "Studies on the Validation of Internal 
Rating
                       Systems" by
                       BIS.
                
                
                   I didn't think the desire was for a classifier but instead 
was for a
                   risk predictor.  If prediction is the goal, classification 
methods
                   or accuracy indexes based on classifications do not work 
very well.
                
                
                
                       2) Calibration: Correct calibration of a PD rating 
system means
                       that the
                       calibrated PD estimates are accurate and conform to the 
observed
                       default
                       rates. See, for instance, An Overview and Framework for
                       PD Backtesting and Benchmarking, by Castermans et al.
                
                
                   I'm unclear on what you mean here.  Correct calibration of a
                   predictive system means that the UNcalibrated estimates are 
accurate
                   (i.e., they don't need any calibration).  (What is PD?)
                
                
                
                       Frank, you are referring the #1 and I am referring to #2.
                       Nonetheless, I would never create a rating system if my 
model
                       doesn't
                       discriminate better than a coin toss.
                
                
                   For sure
                   Frank
                
                
                
                       Regards,
                
                       Pedro
                
                
                
                
                
                
                       -----Original Message-----
                       From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
                       <mailto:[EMAIL PROTECTED]>] Sent: Tuesday, October 07,
                       2008 11:02 AM
                       To: Rodriguez, Pedro
                
                       Cc: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
                       r-help@r-project.org <mailto:r-help@r-project.org>
                       Subject: Re: [R] How to validate model?
                
                
                       [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
                       wrote:
                
                           Usually one validates scorecards with the ROC curve, 
Pietra
                           Index, KS
                           test, etc. You may be interested in the WP 14 from 
BIS
                
                           (www.bis.org <http://www.bis.org>).


                           Regards,
                
                           Pedro
                
                
                       No, the validation should be done using an absolute 
reliability
                       (calibration) curve.  You need to verify that at all 
levels of
                       predicted
                
                       risk there is agreement with the true probability of 
failure.
                        An ROC curve does not do that, and I doubt the others 
do.  A
                       resampling-corrected loess calibration curve is a good 
approach
                       as implemented in the Design package's calibrate 
function.
                
                       Frank
                
                           -----Original Message-----
                           From: [EMAIL PROTECTED]
                           <mailto:[EMAIL PROTECTED]>
                
                       [mailto:[EMAIL PROTECTED]
                       <mailto:[EMAIL PROTECTED]>]
                
                           On Behalf Of Maithili Shiva
                           Sent: Tuesday, October 07, 2008 8:22 AM
                
                           To: r-help@r-project.org 
<mailto:r-help@r-project.org>
                           Subject: [R] How to validate model?
                
                           Hi!
                
                           I am working on scorecard model and I have arrived 
at the
                           regression
                           equation. I have used logistic regression using R.
                
                           My question is how do I validate this model? I do 
have hold
                           out sample
                           of 5000 customers.
                
                           Please guide me. Problem is I had never used 
Logistic regression
                
                       earlier
                
                           neither I am used to credit scoring models.
                
                           Thanks in advance
                
                           Maithili
                
                           ______________________________________________
                
                           R-help@r-project.org <mailto:R-help@r-project.org> 
mailing list

                           https://stat.ethz.ch/mailman/listinfo/r-help
                           PLEASE do read the posting guide
                           http://www.R-project.org/posting-guide.html
                           and provide commented, minimal, self-contained, 
reproducible
                           code.
                
                           ______________________________________________
                
                           R-help@r-project.org <mailto:R-help@r-project.org> 
mailing list

                           https://stat.ethz.ch/mailman/listinfo/r-help
                           PLEASE do read the posting guide
                
                       http://www.R-project.org/posting-guide.html
                
                           and provide commented, minimal, self-contained, 
reproducible
                           code.
                
                
                
                
                
                   --    Frank E Harrell Jr   Professor and Chair           
School of Medicine
                                       Department of Biostatistics   Vanderbilt 
University
                
                   ______________________________________________
                
                   R-help@r-project.org <mailto:R-help@r-project.org> mailing 
list

                   https://stat.ethz.ch/mailman/listinfo/r-help
                   PLEASE do read the posting guide
                   http://www.R-project.org/posting-guide.html
                   and provide commented, minimal, self-contained, reproducible 
code.
                
                
                
                
                -- 
                Regards,
                
                Ajay Ohri
                http://tinyurl.com/liajayohri
                
                
                



        -- 
        Frank E Harrell Jr   Professor and Chair           School of Medicine
                            Department of Biostatistics   Vanderbilt University
        




-- 
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to validate model?

Reply via email to