Hi,

I don't think the final verdict has been spoken.  Peter's posts have hinted at 
ill-conditioning as the crux of the problem.  So, I decided to try a couple of  
more things: (1) standardizing the covariates, (2) exact gradient, and (3) both 
(1) and (2).

I compute the "exact" gradient using a complex-step derivative approach.  This 
works just like the standard first-order, forward differencing.  The only (but, 
essential) difference is that an imaginary increment, i*dx, is used.  This, 
incredibly, gives exact gradients (up to machine precision). 

Here are the code and the results of my experiments:

data <- read.table("h:/computations/optimx_example.dat", header=TRUE, sep=",")
attach(data)
require(optimx)

## model 18
lnl <- function(theta,y1, y2, x1, x2, x3) {
  n <- length(y1)
  beta <- theta[1:8]
  e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e <- cbind(e1, e2)
  sigma <- t(e)%*%e
  det.sigma <- sigma[1,1] * sigma[2,2] - sigma[1,2] * sigma[2,1]
  logl <- -1*n/2*(2*(1+log(2*pi)) + log(det.sigma)) 
  return(-logl)
}

csd <- function(fn, x, ...) {
# Complex step derivatives; yields exact derivatives
h <- .Machine$double.eps 
n <- length(x)
h0 <- g <- rep(0, n)
for (i in 1:n) {
h0[i] <- h * 1i
g[i] <- Im(fn(x+h0, ...))/h 
h0[i]  <- 0
}
g
}

gr.csd <- function(theta,y1, y2, x1, x2, x3) {
csd(lnl, theta, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3)
}

# exact solution as the starting value
start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
p1 <- optimx(start, lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# numerical gradient
p2 <- optimx(rep(0,8), lnl, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# exact gradient
p2g <- optimx(rep(0,8), lnl, gr.csd, y1=y1, y2=y2, x1=x1, x2=x2, x3=x3, 
control=list(all.methods=TRUE, maxit=1500))

# comparing p2 and p2g, we see the dramatic improvement in BFGS when exact 
gradient is used, we also see a major difference for L-BFGS-B
# Exact gradient did not affect the gradient methods, CG and spg, much.  
However, convergence of Rcgmin improved when exact gradient was used

x1s <- scale(x1)
x2s <- scale(x2)
x3s <- scale(x3)
                                                                                
              
p3 <- optimx(rep(0,8),lnl, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
control=list(all.methods=TRUE, maxit=1500))

# both scaling and exact gradient
p3g <- optimx(rep(0,8),lnl, gr.csd, y1=y1, y2=y2, x1=x1s, x2=x2s, x3=x3s, 
control=list(all.methods=TRUE, maxit=1500))

# Comparing p3 and p3g, use of exact gradient improved spg dramatically.  
However, it made CG worse! 

# Of course, derivative-free methods newuoa, and Nelder-Mead are improved by 
scaling, but not by the availability of exact gradients.  I don't know what is 
wrong with bobyqa in this example.

In short, even with scaling and exact gradients, this optimization problem is 
recalcitrant.

Best,
Ravi.

________________________________________
From: Mike Marchywka [marchy...@hotmail.com]
Sent: Thursday, May 12, 2011 8:30 AM
To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
Cc: r-help@r-project.org
Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
1982 Econometrica R vs Stata

So what was the final verdict on this discussion? I kind of
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the "optimum" solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider.

The function to be optimized appears to be, as OP stated,
some function of residuals of two ( unrelated ) fits.
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the "Error."  The intent would appear to be to
keep the residuals distributed similarly in the two ( unrelated) fits.
 I guess my question is,
" did anyone determine that there is a unique solution?" or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.










----------------------------------------
> From: rvarad...@jhmi.edu
> To: pda...@gmail.com; alex.ols...@gmail.com
> Date: Sat, 7 May 2011 11:51:56 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> There is something strange in this problem. I think the log-likelihood is 
> incorrect. See the results below from "optimx". You can get much larger 
> log-likelihood values than for the exact solution that Peter provided.
>
> ## model 18
> lnl <- function(theta,y1, y2, x1, x2, x3) {
> n <- length(y1)
> beta <- theta[1:8]
> e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> e <- cbind(e1, e2)
> sigma <- t(e)%*%e
> logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
> something wrong here
> return(-logl)
> }
>
> data <- read.table("e:/computing/optimx_example.dat", header=TRUE, sep=",")
>
> attach(data)
>
> require(optimx)
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> # the warnings can be safely ignored in the "optimx" calls
> p1 <- optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p2 <- optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> p3 <- optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
> + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
>
> Ravi.
> ________________________________________
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> Of peter dalgaard [pda...@gmail.com]
> Sent: Saturday, May 07, 2011 4:46 AM
> To: Alex Olssen
> Cc: r-help@r-project.org
> Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
> 1982 Econometrica R vs Stata
>
> On May 6, 2011, at 14:29 , Alex Olssen wrote:
>
> > Dear R-help,
> >
> > I am trying to reproduce some results presented in a paper by Anderson
> > and Blundell in 1982 in Econometrica using R.
> > The estimation I want to reproduce concerns maximum likelihood
> > estimation of a singular equation system.
> > I can estimate the static model successfully in Stata but for the
> > dynamic models I have difficulty getting convergence.
> > My R program which uses the same likelihood function as in Stata has
> > convergence properties even for the static case.
> >
> > I have copied my R program and the data below. I realise the code
> > could be made more elegant - but it is short enough.
> >
> > Any ideas would be highly appreciated.
>
> Better starting values would help. In this case, almost too good values are 
> available:
>
> start <- c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
>
> which appears to be the _exact_ solution.
>
> Apart from that, it seems that the conjugate gradient methods have 
> difficulties with this likelihood, for some less than obvious reason. 
> Increasing the maxit gets you closer but still not satisfactory.
>
> I would suggest trying out the experimental optimx package. Apparently, some 
> of the algorithms in there are much better at handling this likelihood, 
> notably "nlm" and "nlminb".
>
>
>
>
> >
> > ## model 18
> > lnl <- function(theta,y1, y2, x1, x2, x3) {
> > n <- length(y1)
> > beta <- theta[1:8]
> > e1 <- y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
> > e2 <- y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
> > e <- cbind(e1, e2)
> > sigma <- t(e)%*%e
> > logl <- -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
> > return(-logl)
> > }
> > p <- optim(0*c(1:8), lnl, method="BFGS", hessian=TRUE, y1=y1, y2=y2,
> > x1=x1, x2=x2, x3=x3)
> >
> > "year","y1","y2","x1","x2","x3"
> > 1929,0.554779,0.266051,9.87415,8.60371,3.75673
> > 1930,0.516336,0.297473,9.68621,8.50492,3.80692
> > 1931,0.508201,0.324199,9.4701,8.27596,3.80437
> > 1932,0.500482,0.33958,9.24692,7.99221,3.76251
> > 1933,0.501695,0.276974,9.35356,7.98968,3.69071
> > 1934,0.591426,0.287008,9.42084,8.0362,3.63564
> > 1935,0.565047,0.244096,9.53972,8.15803,3.59285
> > 1936,0.605954,0.239187,9.6914,8.32009,3.56678
> > 1937,0.620161,0.218232,9.76817,8.42001,3.57381
> > 1938,0.592091,0.243161,9.51295,8.19771,3.6024
> > 1939,0.613115,0.217042,9.68047,8.30987,3.58147
> > 1940,0.632455,0.215269,9.78417,8.49624,3.57744
> > 1941,0.663139,0.184409,10.0606,8.69868,3.6095
> > 1942,0.698179,0.164348,10.2892,8.84523,3.66664
> > 1943,0.70459,0.146865,10.4731,8.93024,3.65388
> > 1944,0.694067,0.161722,10.4465,8.96044,3.62434
> > 1945,0.674668,0.197231,10.279,8.82522,3.61489
> > 1946,0.635916,0.204232,10.1536,8.77547,3.67562
> > 1947,0.642855,0.187224,10.2053,8.77481,3.82632
> > 1948,0.641063,0.186566,10.2227,8.83821,3.96038
> > 1949,0.646317,0.203646,10.1127,8.82364,4.0447
> > 1950,0.645476,0.187497,10.2067,8.84161,4.08128
> > 1951,0.63803,0.197361,10.2773,8.9401,4.10951
> > 1952,0.634626,0.209992,10.283,9.01603,4.1693
> > 1953,0.631144,0.219287,10.3217,9.06317,4.21727
> > 1954,0.593088,0.235335,10.2101,9.05664,4.2567
> > 1955,0.60736,0.227035,10.272,9.07566,4.29193
> > 1956,0.607204,0.246631,10.2743,9.12407,4.32252
> > 1957,0.586994,0.256784,10.2396,9.1588,4.37792
> > 1958,0.548281,0.271022,10.1248,9.14025,4.42641
> > 1959,0.553401,0.261815,10.2012,9.1598,4.4346
> > 1960,0.552105,0.275137,10.1846,9.19297,4.43173
> > 1961,0.544133,0.280783,10.1479,9.19533,4.44407
> > 1962,0.55382,0.281286,10.197,9.21544,4.45074
> > 1963,0.549951,0.28303,10.2036,9.22841,4.46403
> > 1964,0.547204,0.291287,10.2271,9.23954,4.48447
> > 1965,0.55511,0.281313,10.2882,9.26531,4.52057
> > 1966,0.558182,0.280151,10.353,9.31675,4.58156
> > 1967,0.545735,0.294385,10.3351,9.35382,4.65983
> > 1968,0.538964,0.294593,10.3525,9.38361,4.71804
> > 1969,0.542764,0.299927,10.3676,9.40725,4.76329
> > 1970,0.534595,0.315319,10.2968,9.39139,4.81136
> > 1971,0.545591,0.315828,10.2592,9.34121,4.84082
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk Priv: pda...@gmail.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to