Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Rune Haubo Wed, 26 Nov 2014 06:10:03 -0800

Dear Charlie,

I admit that I haven't read your email closely, but here is a way to
test for non-proportional odds using the ordinal package (warning:
self-promotion) using the wine data set also from the ordinal package.
There is more information in the package vignettes


Hope this is something you can use.
Cheers,
Rune

> library(ordinal)
> ## Fit model:
> fm <- clm(rating ~ temp + contact, data=wine)
> summary(fm)
formula: rating ~ temp + contact
data:    wine

 link  threshold nobs logLik AIC    niter max.grad cond.H
 logit flexible  72   -86.49 184.98 6(0)  4.64e-15 2.7e+01

Coefficients:
           Estimate Std. Error z value Pr(>|z|)
tempwarm     2.5031     0.5287   4.735 2.19e-06 ***
contactyes   1.5278     0.4766   3.205  0.00135 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -1.3444     0.5171  -2.600
2|3   1.2508     0.4379   2.857
3|4   3.4669     0.5978   5.800
4|5   5.0064     0.7309   6.850
> ## Model with non-proportional odds for contact:
> fm2 <- clm(rating ~ temp, nominal=~contact, data=wine)
> ## Likelihood ratio test of non-proportional odds:
> anova(fm, fm2)
Likelihood ratio tests of cumulative link models:

    formula:                nominal: link: threshold:
fm  rating ~ temp + contact ~1       logit flexible
fm2 rating ~ temp           ~contact logit flexible

    no.par    AIC  logLik LR.stat df Pr(>Chisq)
fm       6 184.98 -86.492
fm2      9 190.42 -86.209  0.5667  3      0.904
> ## Automatic tests of non-proportional odds for all varibles:
> nominal_test(fm)
Tests of nominal effects

formula: rating ~ temp + contact
        Df  logLik    AIC    LRT Pr(>Chi)
<none>     -86.492 184.98
temp     3 -84.904 187.81 3.1750   0.3654
contact  3 -86.209 190.42 0.5667   0.9040

On 25 November 2014 at 17:21, Charlotte Whitham
<charlotte.whit...@gmail.com> wrote:
> Dear list,
>
> I have used the ‘polr’ function in the MASS package to run an ordinal 
> logistic regression for an ordinal categorical response variable with 15 
> continuous explanatory variables.
> I have used the code (shown below) to check that my model meets the 
> proportional odds assumption following advice provided at 
> (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely 
> helpful, thank you to the authors! However, I’m a little worried about the 
> output implying that not only are the coefficients across various cutpoints 
> similar, but they are exactly the same (see graphic below).
>
> Here is the code I used (and see attached for the output graphic)
>
> FGV1b<-data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]),scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG","Prox_to_for_mL","Prox_to_nat_border","Prox_to_village","Prox_to_roads","Prox_to_rivers","Prox_to_waterFG","Prox_to_watermL","Prox_to_core","Prox_to_NR","PCA1","PCA2","PCA3")]))
>
> b<-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + 
> FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + 
> FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + 
> FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + 
> FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)
>
> #Checking the assumption. So the following code will estimate the values to 
> be graphed. First it shows us #the logit transformations of the probabilities 
> of being greater than or equal to each value of the target #variable
>
> FGV1b$FG1_val_cat<-as.numeric(FGV1b$FG1_val_cat)
>
> sf <- function(y) {
>
>   c('VC>=1' = qlogis(mean(FGV1b$FG1_val_cat >= 1)),
>
>     'VC>=2' = qlogis(mean(FGV1b$FG1_val_cat >= 2)),
>
>     'VC>=3' = qlogis(mean(FGV1b$FG1_val_cat >= 3)),
>
>     'VC>=4' = qlogis(mean(FGV1b$FG1_val_cat >= 4)),
>
>     'VC>=5' = qlogis(mean(FGV1b$FG1_val_cat >= 5)),
>
>     'VC>=6' = qlogis(mean(FGV1b$FG1_val_cat >= 6)),
>
>     'VC>=7' = qlogis(mean(FGV1b$FG1_val_cat >= 7)),
>
>     'VC>=8' = qlogis(mean(FGV1b$FG1_val_cat >= 8)))
>
> }
>
>   (t <- with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y 
> + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + 
> FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + 
> FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + 
> FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))
>
>
>
> #The table displays the (linear) predicted values we would get if we 
> regressed our
>
> #dependent variable on our predictor variables one at a time, without the 
> parallel slopes
>
> #assumption. So now, we can run a series of binary logistic regressions with 
> varying cutpoints
>
> #on the dependent variable to check the equality of coefficients across 
> cutpoints
>
> par(mfrow=c(1,1))
>
> plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,7:8]))
>
>
>
> Apologies that I am no statistics expert and perhaps I am missing something 
> obvious here. However, I have spent a long time trying to figure out if there 
> is a problem in how I tested the model assumption and also trying to figure 
> out other ways to run the same kind of model.
>
>  For example, I read in many help mailing lists that others use the vglm 
> function (in the VGAM package) and the lrm function (in the rms package) (for 
> example see here:  
> http://stats.stackexchange.com/questions/25988/proportional-odds-assumption-in-ordinal-logistic-regression-in-r-with-the-packag).
>  I have tried to run the same models but am continuously coming up against 
> warnings and errors.
>
>  For example, when I try to fit the vglm model with the ‘parallel=FALSE’ 
> argument (as the previous link mentions is important for testing the 
> proportional odds assumption), I encounter the following error:
>
>
>
> Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'
>
> In addition: Warning message:
>
> In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals = 
> residuals,  :
>
>   fitted values close to 0 or 1
>
>
>
> And after many searches for help, I can’t seem to find a way to fix this 
> problem.
>
> I would like to ask please if there is anyone who might understand and be 
> able to explain to me why the graph I produced above looks as it does. If 
> indeed it means that something isn’t right, could you please help me find a 
> way to test the proportional odds assumption when just using the polr 
> function. Or if that is just not possible, then I will resort to trying to 
> use the vglm function, but would then need some help to explain why I keep 
> getting the error given above.
>
> I hope this is clear. Please do let me know if I should provide some more 
> information that would help address this query.
>
> NOTE: As a background, there are 1000 datapoints here, which are actually 
> location points across a study area. I am looking to see if there are any 
> relationships between the categorical response variable and these 15 
> explanatory variables. All of those 15 explanatory variables are spatial 
> characteristics (for example, elevation, x-y coordinates, proximity to forest 
> etc.). The 1000 datapoints were randomly allocated using a GIS, but I took a 
> stratified sampling approach. I made sure that 125 points were randomly 
> chosen within each of the 8 different categorical response levels. I hope 
> this information is also helpful.
>
> I am extremely grateful to anyone who could please give me some guidance with 
> this.
>
> Thank you very much for your time,
>
> Charlie
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Reply via email to