On Sat, Mar 13, 2021 at 4:33 PM Spencer Graves <spencer.gra...@effectivedefense.org> wrote: > > Hi, Deepayan: > > > On 2021-03-13 01:27, Deepayan Sarkar wrote: > > On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves > > <spencer.gra...@effectivedefense.org> wrote: > >> > >> TWO COMMENTS: > >> > >> > >> 1. DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <- > >> optim(...)"? If yes and if "optim" terminated normally, the 60,000+ > >> paramters should be there as est$par. See the documentation on "optim". > >> > >> > >> 2. WHAT PROBLEM ARE YOU TRYING TO SOLVE? > >> > >> > >> I hope you will forgive me for being blunt (or perhaps > >> bigoted), but > >> I'm skeptical about anyone wanting to use optim to estimate 60,000+ > >> parameters. With a situation like that, I think you would be wise to > >> recast the problem as one in which those 60,000+ parameters are sampled > >> from some hyperdistribution characterized by a small number of > >> hyperparameters. Then write a model where your observations are sampled > >> from distribution(s) controlled by these random parameters. Then > >> multiply the likelihood of the observations by the likelihood of the > >> hyperdistribution and integrate out the 60,000+ parameters, leaving only > >> a small number hyperparameters. > > > > Just a comment on this comment: I think it's perfectly reasonable to > > optimize 60k+ parameters with conjugate gradient. CG was originally > > developed to solve linear equations of the form Ax=b. If x was not > > large in size, one would just use solve(A, b) instead of an iterative > > method. > > > > Use of CG is quite common in image processing. A relatively small > > 300x300 image will give you 90k parameters. > > > > -Deepayan > > > > Thanks for this. > > > If both A and b are 300x300, then x will also be 300x300.
Sorry for being unclear: the images themselves (b or x) are viewed as a vector, so would be 90000x1. A would be 90000x90000, so essentially impossible to construct. CG can solve Ax=b as long as Ax can be evaluated (for arbitrary x). > What do you do in this case if A is not square or even ill > conditioned? A has to be p.d. > Do you care if you get only one of many possible or approximate > solutions, and the algorithm spends most of its time making adjustments > in a singular subspace that would have best been avoided? Well, in my experience, optim(method="CG") behaves quite badly if A is not full-rank. I don't think other implementations will be any better, but I am not sure. If you are interested in practical uses of this, we can discuss more off-list. -Deepayan > Spencer > > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.