peter dalgaard <pdalgd <at> gmail.com> writes: > > > On 16 Jun 2014, at 05:22 , array chip <arrayprofile <at> yahoo.com> wrote: >
> > Hi, I am using clogit() from survival package to do conditional > > logistic regression. I also need to make prediction on an > > independent dataset to calculate predicted probability. Here is an > > example: [snip] > > Can anyone suggest what's wrong here? > > > The direct cause is that clogit() works by using the fact that the > likelihood is equivalent to a coxph() likelihood with stratification > and all observation lengths set to 1. Therefore the analysis is > formally on Surv(rep(1, 150L), status) and that goes belly-up if you > apply the same formula to a data set of different length. > However, notice that there is no such thing as predict.clogit(), so > you are attempting predict.coxph() on a purely formal Cox model. It > is unclear to what extent predicted values, in the sense of coxph() > are compatible with predictions in conditional logit models. > > I'm rusty on this, but I think what you want is something like > > m <- model.matrix(~ x1 + x2 - 1, data=dat.test) > pp <- exp(m %*% coef(fit)) > pps <- ave(pp, dat.test$set, FUN=sum) > pp/pps > > i.e. the conditional probability that an observation is a case given > covariates and that there is on case in each set (in the data given, > you have sets of three with one being a case, so all predicted > probabilities are close to 0.33). For more general matched sets, I'm > not really sure what one wants. Real experts are welcome to chime > in. For the general situation of n cases in a stratum of size N, you want the probability that the unit in question is one of n units drawn from a stratum of size N without replacement with unequal probabilities of selection over the units. I am *not* an expert on that, but there is plenty written on it. Horvitz, Daniel G., and Donovan J. Thompson. "A generalization of sampling without replacement from a finite universe." Journal of the American Statistical Association 47.260 (1952): 663-685. is a place to start. The probability in question is a sum over the factorial(n)*choose(N-1,n-1)) elements corresponding to the number of samples (and orders) that include a chosen element. Of course, for n=1 there is just the one element, pp/pps. HTH, Chuck ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.