Re: [R] Checking the assumptions for a proper GLM model

Greg Snow Fri, 19 Feb 2010 12:36:39 -0800

The best validation of assumptions is a good knowledge of the origin of your 
data.  And with 18 bullet points below, if you do all of these every time you 
are going to end up with a lot of false positives when all your assumptions are 
met.  Understanding your data so that you know which assumptions are most 
likely to be violated so you can focus on those is important, also 
understanding which assumptions your technique is robust against is good.


Rather than use the strict tests whose hypotheses may not match exactly what 
you want to test, using the vis.test function from the TeachingDemos package 
may be appropriate.

Specific comments inline below:

> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Jay
> Sent: Thursday, February 18, 2010 6:33 AM
> To: r-help@r-project.org
> Subject: Re: [R] Checking the assumptions for a proper GLM model
> 
> So what I'm looking for is readily available tools/packages that could
> produce some of the following:
> 
> 3.6 Summary of Useful Commands (STATA: Source:
> http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalog3
> .htm)
> 
>     * linktest--performs a link test for model specification, in our
> case to check if logit is the right link function to use. This command
> is issued after the logit or logistic command.

The scglm function in the forward package looks like it may do this.


>     * lfit--performs goodness-of-fit test, calculates either Pearson
> chi-square goodness-of-fit statistic or Hosmer-Lemeshow chi-square
> goodness-of-fit depending on if the group option is used.

The chisq.test function in stats does goodness of fit tests, there are several 
other functions that show up when doing a search for "goodness" that may help 
(did not see any specific for glm models).  

The book Regression Modeling Strategies (which the rms package supports) talks 
a bit about the Hosmer-Lemeshow test and supports my claim that you really need 
to understand the data before using this test.  It also presents an alternative.

>     * fitstat -- is a post-estimation command that computes a variety
> of measures of fit.

It is hard to find equivalents without knowing what the measures are. Many can 
probably be computed from the glm summary information, others may be included 
in the output from lrm (rms package).

>     * lsens -- graphs sensitivity and specificity versus probability
> cutoff.

There are a couple of packages that do ROC curves, but I find that they are 
easy to do by hand.

>     * lstat -- displays summary statistics, including the
> classification table, sensitivity, and specificity.

These can be computed fairly easy by hand, they are probably also available in 
packages like epicalc or ROC.  There value as diagnostics is another matter.

>     * lroc -- graphs and calculates the area under the ROC curve based
> on the model.

I believe that lrm (rms package) computes area under the curve.  It is also an 
easy one to calculate by hand

>     * listcoef--lists the estimated coefficients for a variety of
> regression models, including logistic regression.

The coef and summary functions provide the coefficients for glm and other models


>     * predict dbeta --  Pregibon delta beta influence statistic

Don't know about this one but see below if this is based on leave one out stats

>     * predict deviance -- deviance residual

The resid (residuals) function has an option to return deviance residuals (it 
looks like it is the default).

>     * predict dx2 -- Hosmer and Lemeshow change in chi-square
> influence statistic
>     * predict dd -- Hosmer and Lemeshow change in deviance statistic
>     * predict hat -- Pregibon leverage

Don't know these ones

>     * predict residual -- Pearson residuals; adjusted for the
> covariate pattern
>     * predict rstandard -- standardized Pearson residuals; adjusted
> for the covariate pattern

See ?residuals.glm to see if any of those options work for you.

>     * ldfbeta -- influence of each individual observation on the
> coefficient estimate ( not adjusted for the covariate pattern)

For linear models there is a nice computational shortcut to do leave one out 
statistics, for glms you need to refit the model each time.  But with a fast 
computer this is still fairly quick and easy.  There may be functions existing 
to do this, but it would only take a couple of lines of code to do it manually.

>     * graph with [weight=some_variable] option
>     * scatlog--produces scatter plot for logistic regression.

Try ?plot.glm

>     * boxtid--performs power transformation of independent variables
> and performs nonlinearity test.
> 

If potential non linearity is an issue, splines may work better for this. There 
are some good examples of testing and using the splines in RMS (the book) and 
rms (the package).

> But, since I'm new to GLM, I owuld greatly appreciate how you/others
> go about and test the validity of a GLM model.
> 
> 


Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking the assumptions for a proper GLM model

Reply via email to