I am not that proficient in R. I found some codes on web using those indicator variables to sum up log likelihood. I block out bols in the codes, but I also tried using them as start value for my estimation of ologit. Didn't work. Thanks for your suggestion.
Jun On Wed, Aug 1, 2012 at 5:44 PM, Bert Gunter <gunter.ber...@gene.com> wrote: > Disclaimer: I have not followed this thread at all, but only wish to note: > > 1) Indicator variables are (almost?) never needed in R -- that you are > fooling with them suggests that there is probably a better approach. > > 2) Your bols is just least regression, no? -- If so, there are far > better ways to do this in R. > > 3) ?model.frame and ?model.matrix may be relevant to what you're trying to do. > > While you may get the help you need on this list (though clearly not > from me!), I suspect that you would benefit from consulting with a > local statistician/R expert who would help you with a major rewrite of > your code. > > But ... note my disclaimer again at the top. > > Cheers, > Bert > > On Wed, Aug 1, 2012 at 2:34 PM, Xu Jun <junx...@gmail.com> wrote: >> Thanks Michael. Now I switched my approach after doing some google. >> Following are my new codes: >> >> ################################################### >> library(foreign) >> readin <- read.dta("ordfile.dta", convert.factors=FALSE) >> myvars <- c("depvar", "x1", "x2", "x3") >> mydta <- readin[myvars] >> # remove all missings >> mydta <- na.omit(mydta) >> >> ologit.lf <- function(theta, y, X) { >> X <- as.matrix(X) >> y <- as.matrix(y) >> n <- nrow(X) >> k <- ncol(X) >> b <- theta[1:k] >> tau1 <- as.numeric(theta [k+1]) >> tau2 <- as.numeric(theta [k+2]) >> tau3 <- as.numeric(theta [k+3]) >> >> p1 <- (1/(1+exp( - tau1 + X %*% b))) >> p2 <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b))) >> p3 <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b))) >> p4 <- 1 - (1/(1+exp( - tau3 + X %*% b))) >> >> # create indicator variables >> i <- rep(1, nrow(X)) >> i1 <- as.numeric( i == y) >> i2 <- as.numeric(2*i == y) >> i3 <- as.numeric(3*i == y) >> i4 <- as.numeric(4*i == y) >> >> llik <- sum(i1*log(p1) + i2*log(p2) + i3*log(p3) + i4*log(p4)) >> return(-llik) >> } >> >> y <- (mydta$depvar) >> X <- cbind(mydta$x1, mydta$x2, mydta$x3) >> >> if (FALSE) { # providing start values in optim >> xint <- cbind(rep(1,nrow(X)),X) >> bols <- (solve(t(xint) %*% xint)) %*% (t(xint) %*% y) >> startval <- rbind(as.matrix(bols[2:nrow(bols)]),0,0,0) >> } >> >> runopt <- optim(rep(0, ncol(X)+3), ologit.lf, method="BFGS", >> hessian=T, y=y, X=X) >> ######################################################################### >> >> But then I got the following error message; >> Error in optim(rep(0, ncol(X) + 3), ologit.lf, method = "BFGS", hessian = T, >> : >> initial value in 'vmmin' is not finite >> >> I know there is something wrong with the way that I set up my initial >> values, but just couldn't figure out how. Any help would be greatly >> appreciated! >> >> Jun >> >> On Tue, Jul 31, 2012 at 10:07 PM, R. Michael Weylandt >> <michael.weyla...@gmail.com> wrote: >>> On Tue, Jul 31, 2012 at 7:57 PM, Xu Jun <junx...@gmail.com> wrote: >>>> Dear R listers, >>>> >>>> I am learning the MLE utility optim() in R to program ordered logit >>>> models just as an exercise. See below I have three independent >>>> variables, x1, x2, and x3. Y is coded as ordinal from 1 to 4. Y is not >>>> yet a factor variable here. The ordered logit model satisfies the >>>> parallel regression assumption. The following codes can run through, >>>> but results were totally different from what I got using the polr >>>> function from the MASS package. I think it might be due to the way how >>>> the p is constructed in the ologit.lf function. I am relatively new to >>>> R, and here I would guess probably something related to the class type >>>> (numeric vs. matrix) or something along that line among those if >>>> conditions. Thanks in advance for any suggestion. >>>> >>>> Jun Xu, PhD >>>> Assistant Professor >>>> Department of Sociology >>>> Ball State University >>>> Muncie, IN 47306 >>>> >>>> >>>> #################################################################### >>>> >>>> library(foreign) >>>> readin <- read.dta("ordfile.dta", convert.factors=FALSE) >>>> myvars <- c("depvar", "x1", "x2", "x3") >>>> mydta <- readin[myvars] >>>> # remove all missings >>>> mydta <- na.omit(mydta) >>>> >>>> # theta is the parameter vector >>>> ologit.lf <- function(theta, y, X) { >>>> n <- nrow(X) >>>> k <- ncol(X) >>>> # b is the coefficient vector for independent variables >>>> b <- theta[1:k] >>>> # tau1 is cut-point 1 >>>> tau1 <- theta [k+1] >>>> # tau2 is cut-point 2 >>>> tau2 <- theta [k+2] >>>> # tau3 is cut-point 1 >>>> tau3 <- theta [k+3] >>>> >>>> if (y == 1){ >>>> p <- (1/(1+exp( - tau1 + X %*% b))) >>>> } >>>> if (y == 2) { >>>> p <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b))) >>>> } >>>> if (y == 3) { >>>> p <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b))) >>>> } >>>> if (y == 4) { >>>> p <- 1 - (1/(1+exp( - tau3 + X %*% b))) >>>> } >>>> - sum(p) >>>> } >>>> >>>> y <- as.numeric(mydta$depvar) >>>> X <- cbind (mydta$x1, mydta$x2, mydta$x3) >>>> runopt <- optim(rep(0, ncol(X)+4), ologit.lf, method="BFGS", >>>> hessian=T, y=y, X=X) >>>> >>>> >>>> There were 50 or more warnings (use warnings() to see the first 50) >>>>> warnings() >>>> >>>> 1: In if (y == 1) { ... : >>>> the condition has length > 1 and only the first element will be used >>>> 2: In if (y == 2) { ... : >>>> the condition has length > 1 and only the first element will be used >>> >>> It looks like you've got a fundamental problem in your if/else >>> statements. if and else are control structures and so they operate on >>> the whole program flow -- I think you want the ifelse() function here >>> instead. >>> >>> Take a look at this example: >>> >>> x <- c(1, 5, 9) >>> >>> if(x < 3) {y <- x^2} else {y <- 2} >>> >>> z <- ifelse(x < 3, x^2, 2) >>> >>> print(x) >>> print(y) >>> print(z) >>> >>> See also ?ifelse >>> >>> Best, >>> Michael >>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.