Hi guys, I wanted to look into this as well. The main issue I think is in the speed of the objective function. Running @time on the objective function suggested a large amount of byte allocation. Checking the type revealed that getting x and y from data would set their types to Any.
So I convert the data to {Float64,3}, and then I changed to only store cumulative sum, not the vector of likelihood. I run the objective function 50 times. without the convert I get a total time of 9.18s with the convert and original function I get 4.15s with the convert and the new function I get 1.49s with matlab, I get 0.64s so matlab still appears to be 2.5 times faster. But I am guessing matlab is using SIMD instructions when computing matrix multiplications. So we would need to try to use BLAS in julia with matrix multiplication to get a very good comparison. Anyway, fixing the type of the input, and just summing inside the loop gives a 6x speed up. PS: Running the full optimization with the convert and the new function I get 4.8s with my matlab I get 4s I could not commit to Holger gist, so I forked it: https://gist.github.com/tlamadon/58c47c115f8cf2388e89 please check that I have not done anything stupid, but output seemed similar. Holger, I hope you are having a good time at home (or in Paris?), And a world cup note: Allez les bleus! very best, t. On Tuesday, 27 May 2014 14:03:30 UTC+1, Holger Stichnoth wrote: > > Hi John, hi Miles, > > Thanks to both of you. I did not have time to look into this over the > weekend; I will do so in the next couple of days. I have already uploaded > the Matlab files for comparison: > https://gist.github.com/stichnoth/7f251ded83dcaa384273 > > Holger > > > On Thursday, 22 May 2014 23:03:58 UTC+1, John Myles White wrote: >> >> Yeah, this case is tricky enough that we really need to get down to the >> lowest details: >> >> (1) Do Julia and Matlab perform similar numbers of function evaluations? >> >> (2) If they don't perform similar numbers of function evaluations, is one >> of them producing a better solution? Is the one that's producing a better >> solution doing more function evaluations? >> >> (3) If they're doing different numbers of function evaluations and the >> one that does _fewer_ evaluations also produces a better solution, what's >> the reason? For example, is our line search default less effective for this >> problem than the Matlab line search? If you try other line search >> algorithms, do the results stay the same? >> >> Unfortunately, answering all of these reliably make take us all pretty >> far down the rabbit hole. But they're worth pushing on systematically. >> >> -- John >> >> On May 22, 2014, at 2:59 PM, Miles Lubin <miles...@gmail.com> wrote: >> >> I can get another 50% speedup by: >> >> - Running the optimization twice and timing the second run only, this is >> the more appropriate way to benchmark julia because it excludes the >> function compilation time >> - Setting autodiff=true >> - Breaking up the long chains of sums, apparently these seem to be slow >> >> At this point one really needs to compare the number of function >> evaluations in each method, as John suggested. >> >> On Thursday, May 22, 2014 9:53:36 AM UTC-4, Holger Stichnoth wrote: >>> >>> Thanks, it's faster now (by roughly a factor of 3 on my computer), but >>> still considerably slower than fminunc: >>> >>> Averages over 20 runs: >>> Julia/Optim.optimize: 10.5s >>> Matlab/fminunc: 2.6s >>> >>> Here are my Matlab settings: >>> options = optimset('Display', 'iter', ... >>> 'MaxIter', 2500, 'MaxFunEvals', 500000, ... >>> 'TolFun', 1e-6, 'TolX', 1e-6, ... >>> 'GradObj', 'off', 'DerivativeCheck', 'off'); >>> >>> startb = ones(1,nVar)'; >>> [estim_clo, ll_clo]= ... >>> fminunc(@(param)clogit_ll(param,data), ... >>> startb,options); >>> >>> Could the speed issue be related to the following messages that I get >>> when I run the Julia code? >>> C:\Users\User\Documents\References\Software\Julia\mlubin>julia main.jl >>> Warning: could not import Base.foldl into NumericExtensions >>> Warning: could not import Base.foldr into NumericExtensions >>> Warning: could not import Base.sum! into NumericExtensions >>> Warning: could not import Base.maximum! into NumericExtensions >>> Warning: could not import Base.minimum! into NumericExtensions >>> >>> >>> >>> >>> Am Donnerstag, 22. Mai 2014 14:18:36 UTC+1 schrieb Miles Lubin: >>>> >>>> I was able to get a nearly 5x speedup by avoiding the matrix allocation >>>> and making the accumulators type stable: >>>> https://gist.github.com/mlubin/055690ddf2466e98bba6 >>>> >>>> How does this compare with Matlab now? >>>> >>>> On Thursday, May 22, 2014 6:38:44 AM UTC-4, Holger Stichnoth wrote: >>>>> >>>>> @ John: You are right, when I specify the function as >>>>> clogit_ll(beta::Vector) instead of clogit_ll(beta::Vector{Float64}), >>>>> autodiff = true works fine. Thanks for your help! >>>>> >>>>> @ Tim: I have set the rather strict default convergence criteria of >>>>> Optim.optimize to Matlab's default values for fminunc, but the speed >>>>> difference is still there. >>>>> >>>>> @ Miles/John: Getting rid of the global variables through closures and >>>>> devectorizing made the optimization _slower_ not faster in my case: >>>>> https://gist.github.com/stichnoth/7f251ded83dcaa384273. I was >>>>> surprised to see this as I expected a speed increase myself. >>>>> >>>>> >>>>> Am Mittwoch, 21. Mai 2014 16:48:51 UTC+1 schrieb Miles Lubin: >>>>>> >>>>>> Just to extend on what John said, also think that if you can >>>>>> restructure the code to devectorize it and avoid using global variables, >>>>>> you'll see *much* better performance. >>>>>> >>>>>> The way to avoid globals is by using closures, for example: >>>>>> function foo(x, data) >>>>>> ... >>>>>> end >>>>>> >>>>>> >>>>>> ... >>>>>> data_raw = readcsv(file) >>>>>> data = reshape(data_raw, nObs, nChoices*(1+nVar), T) >>>>>> >>>>>> >>>>>> >>>>>> Optim.optimize(x-> foo(x,data), ...) >>>>>> >>>>>> >>>>>> >>>>>> On Tuesday, May 20, 2014 11:47:39 AM UTC-4, John Myles White wrote: >>>>>>> >>>>>>> Glad that you were able to figure out the source of your problems. >>>>>>> >>>>>>> It would be good to get a sense of the amount of time spent inside >>>>>>> your objective function vs. the amount of time spent in the code for >>>>>>> optimize(). In general, my experience is that >>90% of the compute time >>>>>>> for >>>>>>> an optimization problem is spent in the objective function itself. If >>>>>>> you >>>>>>> instrument your objective function to produce timing information on >>>>>>> each >>>>>>> call, that would help a lot since you could then get a sense of how >>>>>>> much >>>>>>> time is being spent in the code for optimize() after accounting for >>>>>>> your >>>>>>> function itself. >>>>>>> >>>>>>> It’s also worth keeping in mind that your use of implicit finite >>>>>>> differencing means that your objective function is being called a lot >>>>>>> more >>>>>>> times than theoretically necessary, so that any minor performance issue >>>>>>> in >>>>>>> it will very substantially slow down the solver. >>>>>>> >>>>>>> Regarding you objective function’s code, I suspect that the >>>>>>> combination of global variables and memory-allocating vectorized >>>>>>> arithmetic >>>>>>> means that your objective function might be a good bit slower in Julia >>>>>>> than >>>>>>> in Matlab. Matlab seems to be a little better about garbage collection >>>>>>> for >>>>>>> vectorized arithmetic and Julia is generally not able to optimize code >>>>>>> involving global variables. >>>>>>> >>>>>>> Hope that points you in the right direction. >>>>>>> >>>>>>> — John >>>>>>> >>>>>>> On May 20, 2014, at 8:34 AM, Holger Stichnoth <stic...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Andreas, >>>>>>> hi John, >>>>>>> hi Miles (via julia-opt, where I mistakenly also posted my question >>>>>>> yesterday), >>>>>>> >>>>>>> Thanks for your help. Here is the link to the Gist I created: >>>>>>> https://gist.github.com/anonymous/5f95ab1afd241c0a5962 >>>>>>> >>>>>>> In the process of producing a minimal (non-)working example, I >>>>>>> discovered that the unexpected results are due to the truncation of the >>>>>>> logit choice probabilities in the objective function. Optim.optimize() >>>>>>> is >>>>>>> sensitive to this when method = :l_bfgs is used. With method = >>>>>>> :nelder_mead, everything works fine. When I comment out the truncation, >>>>>>> :l_bfgs works as well. However, I need to increase the xtol from its >>>>>>> default of 1e-12 to at least 1e-10, otherwise I get the error that the >>>>>>> linesearch failed to converge. >>>>>>> >>>>>>> I guess I should just do without the truncation. The logit >>>>>>> probabilities are between 0 and 1 by construction anyway. I had just >>>>>>> copied >>>>>>> the truncation code from a friend who had told me that probabilities >>>>>>> that >>>>>>> are too close to 0 or 1 sometimes cause numerical problems in his >>>>>>> Matlab >>>>>>> code of the same function. With Optim.optimize(), it seems to be the >>>>>>> other >>>>>>> way around, i.e. moving the probabilities further away from 0 or 1 >>>>>>> (even by >>>>>>> tiny amounts) means that the stability of the (gradient-based) >>>>>>> algorithm is >>>>>>> reduced. >>>>>>> >>>>>>> So for me, the problem is solved. The problem was not with Optim.jl, >>>>>>> but with my own code. >>>>>>> >>>>>>> The only other thing that I discovered when trying out Julia and >>>>>>> Optim.jl is that the optimization is currently considerably slower than >>>>>>> Matlab's fminunc. From the Gist I provided above, are there any >>>>>>> potential >>>>>>> performance improvements that I am missing out on? >>>>>>> >>>>>>> Best wishes, >>>>>>> Holger >>>>>>> >>>>>>> >>>>>>> On Monday, 19 May 2014 14:51:16 UTC+1, John Myles White wrote: >>>>>>>> >>>>>>>> If you can, please do share an example of your code. Logit-style >>>>>>>> models are in general numerically unstable, so it would be good to see >>>>>>>> how >>>>>>>> exactly you’ve coded things up. >>>>>>>> >>>>>>>> One thing you may be able to do is use automatic differentiation >>>>>>>> via the autodiff = true keyword to optimize, but that assumes that >>>>>>>> your >>>>>>>> objective function is written in completely pure Julia code (which >>>>>>>> means, >>>>>>>> for example, that your code must not call any of functions not written >>>>>>>> in >>>>>>>> Julia provided by Distributions.jl). >>>>>>>> >>>>>>>> — John >>>>>>>> >>>>>>>> On May 19, 2014, at 4:09 AM, Andreas Noack Jensen < >>>>>>>> andreasno...@gmail.com> wrote: >>>>>>>> >>>>>>>> What is the output of versioninfo() and Pkg.installed("Optim")? >>>>>>>> Also, would it be possible to make a gist with your code? >>>>>>>> >>>>>>>> >>>>>>>> 2014-05-19 12:44 GMT+02:00 Holger Stichnoth <stic...@gmail.com>: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I installed Julia a couple of days ago and was impressed how easy >>>>>>>>> it was to make the switch from Matlab and to parallelize my code >>>>>>>>> (something I had never done before in any language; I'm an >>>>>>>>> economist with only limited programming experience, mainly in Stata >>>>>>>>> and >>>>>>>>> Matlab). >>>>>>>>> >>>>>>>>> However, I ran into a problem when using Optim.jl for Maximum >>>>>>>>> Likelihood estimation of a conditional logit model. With the default >>>>>>>>> Nelder-Mead algorithm, optimize from the Optim.jl package gave me the >>>>>>>>> same >>>>>>>>> result that I had obtained in Stata and Matlab. >>>>>>>>> >>>>>>>>> With gradient-based methods such as BFGS, however, the algorithm >>>>>>>>> jumped from the starting values to parameter values that are >>>>>>>>> completely >>>>>>>>> different. This happened for all thr starting values I tried, >>>>>>>>> including the >>>>>>>>> case in which I took a vector that is closed to the optimum from the >>>>>>>>> Nelder-Mead algorithm. >>>>>>>>> >>>>>>>>> The problem seems to be that the algorithm tried values so large >>>>>>>>> (in absolute value) that this caused problems for the objective >>>>>>>>> function, where I call exponential functions into which these >>>>>>>>> parameter values enter. As a result, the optimization based on the >>>>>>>>> BFGS >>>>>>>>> algorithm did not produce the expected optimum. >>>>>>>>> >>>>>>>>> While I could try to provide the analytical gradient in this >>>>>>>>> simple case, I was planning to use Julia for Maximum Likelihood or >>>>>>>>> Simulated Maximum Likelihood estimation in cases where the gradient >>>>>>>>> is more >>>>>>>>> difficult to derive, so it would be good if I could make the >>>>>>>>> optimizer run >>>>>>>>> also with numerical gradients. >>>>>>>>> >>>>>>>>> I suspect that my problems with optimize from Optim.jl could have >>>>>>>>> something to do with the gradient() function. In the example below, >>>>>>>>> for >>>>>>>>> instance, I do not understand why the output of the gradient function >>>>>>>>> includes values such as 11470.7, given that the function values >>>>>>>>> differ only >>>>>>>>> minimally. >>>>>>>>> >>>>>>>>> Best wishes, >>>>>>>>> Holger >>>>>>>>> >>>>>>>>> >>>>>>>>> julia> Optim.gradient(clogit_ll,zeros(4)) >>>>>>>>> 60554544523933395e-22 >>>>>>>>> 0Op >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> 14923.564009972584 >>>>>>>>> -60554544523933395e-22 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> 14923.565228435104 >>>>>>>>> 0 >>>>>>>>> 60554544523933395e-22 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> 14923.569064311248 >>>>>>>>> 0 >>>>>>>>> -60554544523933395e-22 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> 14923.560174904109 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> 60554544523933395e-22 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> 14923.63413848258 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> -60554544523933395e-22 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> 14923.495218282553 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> 60554544523933395e-22 >>>>>>>>> >>>>>>>>> 14923.58699717058 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> -60554544523933395e-22 >>>>>>>>> >>>>>>>>> 14923.54224130672 >>>>>>>>> 4-element Array{Float64,1}: >>>>>>>>> -100.609 >>>>>>>>> 734.0 >>>>>>>>> 11470.7 >>>>>>>>> 3695.5 >>>>>>>>> >>>>>>>>> function clogit_ll(beta::Vector) >>>>>>>>> >>>>>>>>> # Print the parameters and the return value to >>>>>>>>> # check how gradient() and optimize() work. >>>>>>>>> println(beta) >>>>>>>>> println(-sum(compute_ll(beta,T,0))) >>>>>>>>> >>>>>>>>> # compute_ll computes the individual likelihood contributions >>>>>>>>> # in the sample. T is the number of periods in the panel. The >>>>>>>>> 0 >>>>>>>>> # is not used in this simple example. In related functions, I >>>>>>>>> # pass on different values here to estimate finite mixtures of >>>>>>>>> # the conditional logit model. >>>>>>>>> return -sum(compute_ll(beta,T,0)) >>>>>>>>> end >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Med venlig hilsen >>>>>>>> >>>>>>>> Andreas Noack Jensen >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>