On Nov 14, 2010, at 8:35 AM, Jürgen Biedermann wrote:

Hi David,

Thank you very much for your answer. It helps me a lot. The offset argument was the key (I didn't understand the description in the R- help file) Rereading my email I found a mistake in the definition of my formula. Instead of p(y) = exp(a + c1*x1 + c2*x2), it has to be: p(y) = exp(a + c1*x1 + c2*x2)/(1+exp(a + c1*x1 + c2*x2)), but actually that doesn't matter much in our case.

Oh, but I think it does matter (at least in the situation where p(event) are above 1/20 or so. At lower probabilities it wouldn't matter very much. You are posting above the default logistic equation used when the distribution is specified as "binomial" which I meant to include but forgot. Had I remembered to include the ..., family="binomial" it would have implied what you write above. So:

firstfit <- glm( c(event,no_event) ~ 1 + x1 +x2, data=dfrm, family="binomial")

# Is the model you cite above and I meant to write:

forcedfit <- glm( c(event,no_event) ~ 1 + offset(logoff), data=dfrm, family="binomial")

(I think the only reason this went unremarked by the usually eagle- eyed rhelp viewership is the weekend has a decreased viewing activity.)


The anova results would have not much interpretability in this setting. You would be testing for the Intercept being zero under very artificial conditions. You have eliminated much statistical meaning by forcing the form of the results.

Imagine the following. I develop a model on one dataset and want to validate it on another. So I could use the coefficents trained on the first dataset to define a glm model (named: ModelV) on the second dataset. Then i could test this model against a NULL model (named: ModelV0) of the second dataset with anova(ModelV, ModelV0, test="Chisq").

Assuming this is a split dataset from run 1 in a split datast, why not develop a score from the the first set and run it against a model that includes the score and the original variables.

Model 1: mod1 <- glm( c(event,no_event) ~ 1 + x1 +x2, data=dfrm1)
dfrm2$scr1 <- with(dfrm2, Intercept + c1*x1 + c2*x2))

Model2: mod2 <- glm( c(event,no_event) ~ 0 + x1 +x2 +scr1, data=dfrm2)

(In this manner impact from the deviations induced by varying X values would be assessed as evidence of potential model mis-specification. This would be most successful with larger datasets. It would probably be a better use of a smaller dataset, however, to get a bootstrapped validation of a model. Or to do a one-tenth holdout and do the validations run ten times. Harrells' "Regression Modeling Strategies" gives a good discussion of validation strategies and his rms amd Hmisc packages provide the functions to perform the validation and display results neatly.)



Best Wishes
Jürgen

--
-----------------------------------
Jürgen Biedermann
Blücherstraße 56
10961 Berlin-Kreuzberg
Mobil: +49 176 247 54 354
Home: +49 30 250 11 713
e-mail: juergen.biederm...@gmail.com


--------- Korrespondenz ----------

Betreff: Re: [R] Define a glm object with user-defined coefficients (logistic regression, family="binomial")
Von: David Winsemius <dwinsem...@comcast.net>
An: Jürgen Biedermann <juergen.biederm...@googlemail.com>
Datum: 13.11.2010 17:15

On Nov 13, 2010, at 7:43 AM, Jürgen Biedermann wrote:

Hi there,

I just don't find the solution on the following problem. :(

Suppose I have a dataframe with two predictor variables (x1,x2) and one depend binary variable (y). How is it possible to define a glm object (family="binomial") with a user defined logistic function like p(y) = exp(a + c1*x1 + c2*x2) where c1,c2 are the coefficents which I define. So I would like to do no fitting of the coefficients. Still, I would like to define a GLM object because I could then easily use other functions which need a glm object as argument (e.g. I could use the anova,

The anova results would have not much interpretability in this setting. You would be testing for the Intercept being zero under very artificial conditions. You have eliminated much statistical meaning by forcing the form of the results.

summary functions).

# Assume dataframe name is dfrm with variables event, no_event, x1, x2, and further assume c1 and c2 are also defined:

dfrm$logoff <- with(dfrm, log(c1*x1 + c2*x2))
forcedfit <- glm( c(event,no_event) ~ 1 + offset(logoff), data=dfrm)

(Obviously untested.)


Thank you very much! Greetings
Jürgen

--
-----------------------------------
Jürgen Biedermann


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to