the purpose of validating indirect measures such as ROC curves.

Biggest Purpose- It is useful while in more marketing /sales meeting context
;)

Also , Deciles specific performance is easy to explain and monitor for
faster execution/re modeling.

Regards,

Ajay

On Wed, Oct 8, 2008 at 4:01 AM, Frank E Harrell Jr <[EMAIL PROTECTED]
> wrote:

> Ajay ohri wrote:
>
>> This is an approach
>>
>> Run the model variables on hold out sample.
>>
>> Check and compare ROC curves between build and validation datasets.
>>
>> Check for changes in parameter estimates (co efficients of variables) p
>> value and signs.
>>
>> Check for binning (response versus deciles of individual variables).
>>
>> Check concordance, and KS Statistic.
>> A decile wise performance of the model in terms of predicted versus
>> actual, rank ordering of deciles, helps in explaining the model to business
>> audience who generally have some business specific input that may require
>> scoring model to be tweaked.
>>
>> This assumes multicollinearity, outliers and missing value treatment have
>> already been done, and holdout sample checks for overfitting. You can always
>> rebuild the model using a different random holdout sample.
>>
>> A stable model would not change too much.
>>
>> In actual implementation , try and build real time triggers for deviations
>> (%) between predicted and actual.
>>
>> Regards,
>>
>> Ajay
>>
>
> I wouldn't recommend that approach but legitimate differences of opinion
> exist on the subject.  In particular I fail to see the purpose of validating
> indirect measures such as ROC curves.
>
> Frank
>
>
>> www.decisionstats.com <http://www.decisionstats.com>
>>
>> On Wed, Oct 8, 2008 at 1:33 AM, Frank E Harrell Jr <
>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>
>>    [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>> wrote:
>>
>>        Hi Frank,
>>
>>        Thanks for your feedback! But I think we are talking about two
>>        different
>>        things.
>>
>>        1) Validation: The generalization performance of the classifier.
>>        See,
>>        for example, "Studies on the Validation of Internal Rating
>>        Systems" by
>>        BIS.
>>
>>
>>    I didn't think the desire was for a classifier but instead was for a
>>    risk predictor.  If prediction is the goal, classification methods
>>    or accuracy indexes based on classifications do not work very well.
>>
>>
>>
>>        2) Calibration: Correct calibration of a PD rating system means
>>        that the
>>        calibrated PD estimates are accurate and conform to the observed
>>        default
>>        rates. See, for instance, An Overview and Framework for
>>        PD Backtesting and Benchmarking, by Castermans et al.
>>
>>
>>    I'm unclear on what you mean here.  Correct calibration of a
>>    predictive system means that the UNcalibrated estimates are accurate
>>    (i.e., they don't need any calibration).  (What is PD?)
>>
>>
>>
>>        Frank, you are referring the #1 and I am referring to #2.
>>        Nonetheless, I would never create a rating system if my model
>>        doesn't
>>        discriminate better than a coin toss.
>>
>>
>>    For sure
>>    Frank
>>
>>
>>
>>        Regards,
>>
>>        Pedro
>>
>>
>>
>>
>>
>>
>>        -----Original Message-----
>>        From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
>>        <mailto:[EMAIL PROTECTED]>] Sent: Tuesday, October 07,
>>        2008 11:02 AM
>>        To: Rodriguez, Pedro
>>        Cc: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
>>        r-help@r-project.org <mailto:r-help@r-project.org>
>>        Subject: Re: [R] How to validate model?
>>
>>        [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>        wrote:
>>
>>            Usually one validates scorecards with the ROC curve, Pietra
>>            Index, KS
>>            test, etc. You may be interested in the WP 14 from BIS
>>            (www.bis.org <http://www.bis.org>).
>>
>>            Regards,
>>
>>            Pedro
>>
>>
>>        No, the validation should be done using an absolute reliability
>>        (calibration) curve.  You need to verify that at all levels of
>>        predicted
>>
>>        risk there is agreement with the true probability of failure.
>>         An ROC curve does not do that, and I doubt the others do.  A
>>        resampling-corrected loess calibration curve is a good approach
>>        as implemented in the Design package's calibrate function.
>>
>>        Frank
>>
>>            -----Original Message-----
>>            From: [EMAIL PROTECTED]
>>            <mailto:[EMAIL PROTECTED]>
>>
>>        [mailto:[EMAIL PROTECTED]
>>        <mailto:[EMAIL PROTECTED]>]
>>
>>            On Behalf Of Maithili Shiva
>>            Sent: Tuesday, October 07, 2008 8:22 AM
>>            To: r-help@r-project.org <mailto:r-help@r-project.org>
>>            Subject: [R] How to validate model?
>>
>>            Hi!
>>
>>            I am working on scorecard model and I have arrived at the
>>            regression
>>            equation. I have used logistic regression using R.
>>
>>            My question is how do I validate this model? I do have hold
>>            out sample
>>            of 5000 customers.
>>
>>            Please guide me. Problem is I had never used Logistic
>> regression
>>
>>        earlier
>>
>>            neither I am used to credit scoring models.
>>
>>            Thanks in advance
>>
>>            Maithili
>>
>>            ______________________________________________
>>            R-help@r-project.org <mailto:R-help@r-project.org> mailing
>> list
>>            https://stat.ethz.ch/mailman/listinfo/r-help
>>            PLEASE do read the posting guide
>>            http://www.R-project.org/posting-guide.html
>>            and provide commented, minimal, self-contained, reproducible
>>            code.
>>
>>            ______________________________________________
>>            R-help@r-project.org <mailto:R-help@r-project.org> mailing
>> list
>>            https://stat.ethz.ch/mailman/listinfo/r-help
>>            PLEASE do read the posting guide
>>
>>        http://www.R-project.org/posting-guide.html
>>
>>            and provide commented, minimal, self-contained, reproducible
>>            code.
>>
>>
>>
>>
>>
>>    --    Frank E Harrell Jr   Professor and Chair           School of
>> Medicine
>>                        Department of Biostatistics   Vanderbilt University
>>
>>    ______________________________________________
>>    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>    PLEASE do read the posting guide
>>    http://www.R-project.org/posting-guide.html
>>    and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ajay Ohri
>> http://tinyurl.com/liajayohri
>>
>>
>>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                     Department of Biostatistics   Vanderbilt University
>



-- 
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to