Re: [R] How to validate model?

Frank E Harrell Jr Tue, 07 Oct 2008 20:23:35 -0700

Ajay ohri wrote:

  the purpose of validating indirect measures such as ROC curves.
Biggest Purpose- It is useful while in more marketing /sales meetingcontext ;)

That is far from clear. It seems that ROC curves are being used toimpress non-statisticians more than for shedding light on the subject.

Also , Deciles specific performance is easy to explain and monitor forfaster execution/re modeling.

That's too low resolution. loess is superior for estimating thecalibration curve.


Frank


Regards,

Ajay

On Wed, Oct 8, 2008 at 4:01 AM, Frank E Harrell Jr<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:


    Ajay ohri wrote:

        This is an approach

        Run the model variables on hold out sample.

        Check and compare ROC curves between build and validation datasets.

        Check for changes in parameter estimates (co efficients of
        variables) p value and signs.

        Check for binning (response versus deciles of individual variables).

        Check concordance, and KS Statistic.
        A decile wise performance of the model in terms of predicted
        versus actual, rank ordering of deciles, helps in explaining the
        model to business audience who generally have some business
        specific input that may require scoring model to be tweaked.

        This assumes multicollinearity, outliers and missing value
        treatment have already been done, and holdout sample checks for
        overfitting. You can always rebuild the model using a different
        random holdout sample.

        A stable model would not change too much.

        In actual implementation , try and build real time triggers for
        deviations (%) between predicted and actual.

        Regards,

        Ajay


    I wouldn't recommend that approach but legitimate differences of
    opinion exist on the subject.  In particular I fail to see the
    purpose of validating indirect measures such as ROC curves.

    Frank


        www.decisionstats.com <http://www.decisionstats.com>
        <http://www.decisionstats.com>

        On Wed, Oct 8, 2008 at 1:33 AM, Frank E Harrell Jr
        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>> wrote:


           [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>> wrote:

               Hi Frank,

               Thanks for your feedback! But I think we are talking
        about two
               different
               things.

               1) Validation: The generalization performance of the
        classifier.
               See,
               for example, "Studies on the Validation of Internal Rating
               Systems" by
               BIS.


           I didn't think the desire was for a classifier but instead
        was for a
           risk predictor.  If prediction is the goal, classification
        methods
           or accuracy indexes based on classifications do not work very
        well.



               2) Calibration: Correct calibration of a PD rating system
        means
               that the
               calibrated PD estimates are accurate and conform to the
        observed
               default
               rates. See, for instance, An Overview and Framework for
               PD Backtesting and Benchmarking, by Castermans et al.


           I'm unclear on what you mean here.  Correct calibration of a
           predictive system means that the UNcalibrated estimates are
        accurate
           (i.e., they don't need any calibration).  (What is PD?)



               Frank, you are referring the #1 and I am referring to #2.
               Nonetheless, I would never create a rating system if my model
               doesn't
               discriminate better than a coin toss.


           For sure
           Frank



               Regards,

               Pedro






               -----Original Message-----
               From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
               <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>] Sent: Tuesday, October 07,
               2008 11:02 AM
               To: Rodriguez, Pedro
               Cc: [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>;
               r-help@r-project.org <mailto:r-help@r-project.org>
        <mailto:r-help@r-project.org <mailto:r-help@r-project.org>>
               Subject: Re: [R] How to validate model?

               [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>
               wrote:

                   Usually one validates scorecards with the ROC curve,
        Pietra
                   Index, KS
                   test, etc. You may be interested in the WP 14 from BIS
                   (www.bis.org <http://www.bis.org> <http://www.bis.org>).


                   Regards,

                   Pedro


               No, the validation should be done using an absolute
        reliability
               (calibration) curve.  You need to verify that at all
        levels of
               predicted

               risk there is agreement with the true probability of failure.
                An ROC curve does not do that, and I doubt the others do.  A
               resampling-corrected loess calibration curve is a good
        approach
               as implemented in the Design package's calibrate function.

               Frank

                   -----Original Message-----
                   From: [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
                   <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>

               [mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
               <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>]

                   On Behalf Of Maithili Shiva
                   Sent: Tuesday, October 07, 2008 8:22 AM
                   To: r-help@r-project.org
        <mailto:r-help@r-project.org> <mailto:r-help@r-project.org
        <mailto:r-help@r-project.org>>
                   Subject: [R] How to validate model?

                   Hi!

                   I am working on scorecard model and I have arrived at the
                   regression
                   equation. I have used logistic regression using R.

                   My question is how do I validate this model? I do
        have hold
                   out sample
                   of 5000 customers.

                   Please guide me. Problem is I had never used Logistic
        regression

               earlier

                   neither I am used to credit scoring models.

                   Thanks in advance

                   Maithili

                   ______________________________________________
                   R-help@r-project.org <mailto:R-help@r-project.org>
        <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
        mailing list

                   https://stat.ethz.ch/mailman/listinfo/r-help
                   PLEASE do read the posting guide
                   http://www.R-project.org/posting-guide.html
                   and provide commented, minimal, self-contained,
        reproducible
                   code.

                   ______________________________________________
                   R-help@r-project.org <mailto:R-help@r-project.org>
        <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
        mailing list

                   https://stat.ethz.ch/mailman/listinfo/r-help
                   PLEASE do read the posting guide

               http://www.R-project.org/posting-guide.html

                   and provide commented, minimal, self-contained,
        reproducible
                   code.

-- Frank E Harrell Jr Professor and ChairSchool of Medicine

                               Department of Biostatistics   Vanderbilt
        University

           ______________________________________________
           R-help@r-project.org <mailto:R-help@r-project.org>
        <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
        mailing list

           https://stat.ethz.ch/mailman/listinfo/r-help
           PLEASE do read the posting guide
           http://www.R-project.org/posting-guide.html
           and provide commented, minimal, self-contained, reproducible
        code.

--Regards,


        Ajay Ohri
        http://tinyurl.com/liajayohri

--Frank E Harrell Jr Professor and Chair School of Medicine

                        Department of Biostatistics   Vanderbilt University




--
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to validate model?

Reply via email to