On Dec 18, 2009, at 7:39 PM, Hien Nguyen wrote:
Thanks a lot for answering my questions.
I have tried to run the clogit for only 64 observations and 4
independent variables and the results are solved instantly. However,
when I run the same command (with only 4 dependent variables) for
the full data, it keeps running for 50 minutes now. :(
Thomas, what do you mean by "maximizing the unconditional likelihood
is fine when the stratum sizes are large"? What I put in "strata
(__)" is actually the possible choices (1-64). Each choices will be
recored more than 4000 times (which means I have more than 4000
values of 1, 4000 values of 2 and so on).
Does it sound right?
I'm pretty sure he means glm( formula, family="binomial", ...) and
skip the strata specification.
--
David.
Thanks a lot
Hien
tlum...@u.washington.edu wrote:
On Fri, 18 Dec 2009, Hien Nguyen wrote:
Dear Drs Winsemius and Berry,
Thanks a lot for your comment and suggestions on running my model.
I am not just new to R but new to CLM as well. :( With your
suggestions, I figure out that I have huge misunderstandings on
the model and data arrangement.
After my finals, I have read again related materials on CLM and
rearranged in an appropriate way before running the model in R.
This time, I have a data of more than 250,000 observations
(created from more than 4000 response) and a model of 15 predictors.
My question is that how long should it takes for the clogit
command to run because it has been running for more 10 hours on a
quad-core computer and still doesn't show any sign of done or
almost done. Is it OK or my command just does not work.
If you have a lot of records with case=1 in a stratum, conditional
logistic regression will be extremely slow. And unnecessary:
maximizing the unconditional likelihood is fine when the stratum
sizes are large.
Note that a quad-core computer won't help. Only one core will be
used in the computations.
-thomas
Thanks a lot for your response
Hien
Charles C. Berry wrote:
On Fri, 4 Dec 2009, David Winsemius wrote:
On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
Dear Dr. Winsemius,
Thank you very much for your reply.
I have tried many possible combinations (even with the model of
only 2 predictors) but it produces the same message. With more
than 4000 observations, I think 14 predictors might not be too
many.
It is what happens in the factor combinations that concern me. I
am guessing that some of those predictors are factors. You
really should not ask r-help questions without providing better
descriptions of both the outcomes and the predictor variables.
Although my dependent variable (Pin) is not discrete (it
ranges from 0 to 1), I do not think it will create problems to
the estimation but I'm not sure
I would think it _would_ cause problems. As I understand it,
conditional methods create contingency tables. Why are you using
an outcome type that is not consistent with the fundamental
regression assumptions of the clogit function?
I do not get that particular error when I munge the infert
dataset to have case be a random uniform value, but I do get an
error.
infert$case <- runif(nrow(infert))
clogit(case~spontaneous+induced+strata(stratum),data=infert)
Error in Surv(rep(1, 248L), case) : Invalid status value
David, I think you were on the right track. I get this:
-----------
clogit(I(case*runif(length(case)))~spontaneous+induced
+strata(ifelse(stratum>40,NA,stratum)),data=infert)
Error in fitter(X, Y, strats, offset, init, control, weights =
weights, :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
Invalid status value, converted to NA
2: In fitter(X, Y, strats, offset, init, control, weights =
weights, :
Ran out of iterations and did not converge
------------
which looks pretty much the same as Hien's error msg
So Hien needs to create a logical status value.
Chuck
p.s.
sessionInfo()
R version 2.10.0 (2009-10-26)
i386-pc-mingw32
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] splines stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] survival_2.35-7
loaded via a namespace (and not attached):
[1] tools_2.10.0
So I certainly would not have proceeded to submit a full
analysis to clogit if I could not get a test case to run under
the situation you propose.
--
David
I have checked the collinearity among predictors and they are
all < 0.5 (which I think is OK). Do you know what else could
make this errors?
Thanks a lot
Hien Nguyen
David Winsemius wrote:
> > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
> > > Dear R-helpers,
> > > > I am very new to R and trying to run the conditional
logit model using
> > "clogit " command.
> > I have more than 4000 observations in my dataset and try to
predict the
> > dependent variable from 14 independent variables. My
command is as > > follows
> > > > clmtest1 <-
> > clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW
+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) > > > > > >
However, it produces the following errors:
> > > > Error in fitter(X, Y, strats, offset, init, control,
weights = weights, > > :
> > NA/NaN/Inf in foreign function call (arg 6)
> > In addition: Warning messages:
> > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value,
converted to > > NA
> > 2: In fitter(X, Y, strats, offset, init, control, weights =
weights, :
> > Ran out of iterations and did not converge
> > > > I search the error message from R forums but it does
not say anything
> > for Conditional Logit Model.
> > With that many predictors in a small dataset, you may have
created matrix > singularities. Perhaps you created a stratum
where all of the subjects > experience the event and others
where none did so. The coefficients might > be driven to
infinities. Try simplifying the model.
> > > > > > Please check for me what it says and what should I
do to solve it.
> >
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/
Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
92093-0901
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.