You’re right that the step size should be effectively adjusted using alpha, beta and gamma in later iterations, but the problem is that the values used for the first simplex generated depends on the differences between the initial values, which makes no sense, as this make optimisation problem not invariant to translations.
Here’s an analogy. Think of the function to maximise as a mountain placed somewhere on Earth. If you start 1 km east and 1 km north of the mountain, and try to find its peak, the values you sample *relative to the peak’s position* should not depend on whether the mountain is situated on Equator, in Australia or in North America, as long as the actual mountain is identical (i.e., there is no *scaling* of the function, only a translation). But for optim with method="Nelder-Mead" they seem to do so. Also, the values of parscale seem to have a rather mysterious effect on the values chosen for later iterations, while their (absolute) values seems to have *no* effect on the initial simplex (but their relative values do have an effect, and a correct effect, AFAICS). Karl Ove Hufthammer la. den 18. 08. 2012 klokka 07.32 (-0700) skreiv Bert Gunter: > Well, I'm no optimization guru, but a quick reading of Wikipedia said > tha step size depends on the initial value configuration and is then > "adjusted" by the algorithm using alpha, beta and gamma scaling > parameters thru the optimization. So it seems that it is supposed to > work exactly as you describe. Why do you expect something else? > > -- Bert > > On Sat, Aug 18, 2012 at 2:30 AM, Karl Ove Hufthammer <k...@huftis.org> wrote: > > Dear all, > > > > I’m having some problems getting optim with method="Nelder-Mead" to work > > properly. It seems like there is no way of controlling the step size, > > and the step size seems to depend on the *difference* between the > > initial values, which makes no sense. Example: > > > > f=function(xy, mu1, mu2) { > > print(xy) > > dnorm(xy[1]-mu1)*dnorm(xy[2]-mu2) > > } > > f1=function(xy) -f(xy, 0, 0) > > optim(c(1,1), f1) > > > > The first four values evaluated are > > > > 1.0, 1.0 > > 1.1, 1.0 > > 1.0, 1.1 > > 0.9, 1.1 > > > > which is reasonable (step size of 0.1) for this function. And if I > > translate both the function and the initial values > > > > f2=function(xy) -f(xy, 5000, 5000) > > optim(c(5001,5001), f2) > > > > the first four values are > > > > 5001.0, 5001.0 > > 5501.1, 5001.0 > > 5001.0, 5501.1 > > 4500.9, 5501.1 > > > > With > > > > f3=function(xy) -f(xy, 0, 5000) > > optim(c(1,5001), f3) > > > > they are > > > > 1.0, 5001.0 > > 501.1, 5001.0 > > 1.0, 5501.1 > > -499.1, 5501.1 > > > > and with > > > > f4=function(xy) -f(xy, -3000, 50000) > > optim(c(-2999,50001), f4) > > > > -2999.0, 50001.0 > > 2001.1, 50001.0 > > -2999.0, 55001.1 > > -7999.1, 55001.1 > > > > However, the function to optimise is the same in all cases, only > > translated, not scaled, so the step size *should* be the same. From > > reading the documentation, it looks like changing the parscale should > > work, and *relative* changes have the intended effect. Example: > > > > optim(c(1,1), f1, control=list(parscale=c(1,5))) > > > > gives the function evaluations > > > > 1.0, 1.0 > > 1.1, 1.0 > > 1.0, 1.5 > > 1.1, 0.5 > > > > But changing both values, e.g., > > > > optim(c(1,1), f1, control=list(parscale=c(500,500))) > > > > gives the same first four values. There *are* eventually some > > differences in the values tried, but these don’t seem to correspond to > > parscale as described in ?optim. For example, for parscale=c(1,1), the > > parameter values tried are > > > > 1: 1, 1 > > 2: 1.1, 1 > > 3: 1, 1.1 > > 4: 0.9, 1.1 > > 5: 0.95, 1.075 > > 6: 0.9, 1 > > 7: 0.85, 0.95 > > 8: 0.95, 0.85 > > 9: 0.9375, 0.9125 > > 10: 0.8, 0.8 > > 11: 0.7, 0.7 > > 12: 0.8, 0.6 > > 13: 0.8125, 0.6875 > > 14: 0.55, 0.45 > > > > while for parscale=c(500,500) they are > > > > 1: 1, 1 > > 2: 1.1, 1 > > 3: 1, 1.1 > > 4: 0.9, 1.1 > > 5: 0.95, 1.075 > > 6: 0.9, 1 > > 7: 0.85, 0.95 > > 8: 0.95, 0.85 > > 9: 0.975, 0.725 > > 10: 0.825, 0.675 > > 11: 0.7375, 0.5125 > > 12: 0.8625, 0.2875 > > 13: 0.859375, 0.453125 > > 14: 0.625000000000001, 0.0750000000000004 > > > > for parscale=1/c(50000,50000) they are > > > > 1: 1, 1 > > 2: 1.1, 1 > > 3: 1, 1.1 > > 4: 0.9, 1.1 > > 5: 0.95, 1.075 > > 6: 0.9, 1 > > 7: 0.85, 0.95 > > 8: 0.95, 0.85 > > 9: 0.9375, 0.9125 > > 10: 0.8, 0.8 > > 11: 0.7, 0.7 > > 12: 0.8, 0.6 > > 13: 0.8125, 0.6875 > > 14: 0.55, 0.45 > > > > And there seems to be no way of actually changing the step size to > > reasonable values (i.e., the same values for optimising f1–f4). > > > > Is there something I have missed in how one is supposed to use optim > > with Nelder-Mead? Or is this actually a bug in the implementation? > > > > > > $ sessionInfo() > > R version 2.15.1 (2012-06-22) > > Platform: x86_64-suse-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=nn_NO.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=nn_NO.UTF-8 LC_COLLATE=nn_NO.UTF-8 > > [5] LC_MONETARY=nn_NO.UTF-8 LC_MESSAGES=nn_NO.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.