Ajay ohri wrote:
This is an approach

Run the model variables on hold out sample.

Check and compare ROC curves between build and validation datasets.

Check for changes in parameter estimates (co efficients of variables) p value and signs.

Check for binning (response versus deciles of individual variables).

Check concordance, and KS Statistic.
A decile wise performance of the model in terms of predicted versus actual, rank ordering of deciles, helps in explaining the model to business audience who generally have some business specific input that may require scoring model to be tweaked.

This assumes multicollinearity, outliers and missing value treatment have already been done, and holdout sample checks for overfitting. You can always rebuild the model using a different random holdout sample.

A stable model would not change too much.

In actual implementation , try and build real time triggers for deviations (%) between predicted and actual.

Regards,

Ajay

I wouldn't recommend that approach but legitimate differences of opinion exist on the subject. In particular I fail to see the purpose of validating indirect measures such as ROC curves.

Frank


www.decisionstats.com <http://www.decisionstats.com>

On Wed, Oct 8, 2008 at 1:33 AM, Frank E Harrell Jr <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> wrote:

        Hi Frank,

        Thanks for your feedback! But I think we are talking about two
        different
        things.

        1) Validation: The generalization performance of the classifier.
        See,
        for example, "Studies on the Validation of Internal Rating
        Systems" by
        BIS.


    I didn't think the desire was for a classifier but instead was for a
    risk predictor.  If prediction is the goal, classification methods
    or accuracy indexes based on classifications do not work very well.



        2) Calibration: Correct calibration of a PD rating system means
        that the
        calibrated PD estimates are accurate and conform to the observed
        default
        rates. See, for instance, An Overview and Framework for
        PD Backtesting and Benchmarking, by Castermans et al.


    I'm unclear on what you mean here.  Correct calibration of a
    predictive system means that the UNcalibrated estimates are accurate
    (i.e., they don't need any calibration).  (What is PD?)



        Frank, you are referring the #1 and I am referring to #2.
        Nonetheless, I would never create a rating system if my model
        doesn't
        discriminate better than a coin toss.


    For sure
    Frank



        Regards,

        Pedro






        -----Original Message-----
        From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>] Sent: Tuesday, October 07,
        2008 11:02 AM
        To: Rodriguez, Pedro
        Cc: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
        r-help@r-project.org <mailto:r-help@r-project.org>
        Subject: Re: [R] How to validate model?

        [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
        wrote:

            Usually one validates scorecards with the ROC curve, Pietra
            Index, KS
            test, etc. You may be interested in the WP 14 from BIS
            (www.bis.org <http://www.bis.org>).

            Regards,

            Pedro


        No, the validation should be done using an absolute reliability
        (calibration) curve.  You need to verify that at all levels of
        predicted

        risk there is agreement with the true probability of failure.
         An ROC curve does not do that, and I doubt the others do.  A
        resampling-corrected loess calibration curve is a good approach
        as implemented in the Design package's calibrate function.

        Frank

            -----Original Message-----
            From: [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>

        [mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>]

            On Behalf Of Maithili Shiva
            Sent: Tuesday, October 07, 2008 8:22 AM
            To: r-help@r-project.org <mailto:r-help@r-project.org>
            Subject: [R] How to validate model?

            Hi!

            I am working on scorecard model and I have arrived at the
            regression
            equation. I have used logistic regression using R.

            My question is how do I validate this model? I do have hold
            out sample
            of 5000 customers.

            Please guide me. Problem is I had never used Logistic regression

        earlier

            neither I am used to credit scoring models.

            Thanks in advance

            Maithili

            ______________________________________________
            R-help@r-project.org <mailto:R-help@r-project.org> mailing list
            https://stat.ethz.ch/mailman/listinfo/r-help
            PLEASE do read the posting guide
            http://www.R-project.org/posting-guide.html
            and provide commented, minimal, self-contained, reproducible
            code.

            ______________________________________________
            R-help@r-project.org <mailto:R-help@r-project.org> mailing list
            https://stat.ethz.ch/mailman/listinfo/r-help
            PLEASE do read the posting guide

        http://www.R-project.org/posting-guide.html

            and provide commented, minimal, self-contained, reproducible
            code.





-- Frank E Harrell Jr Professor and Chair School of Medicine
                        Department of Biostatistics   Vanderbilt University

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.




--
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri




--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to