I was thinking that all IterativeSolvers would benefit from a setOptimizer method. I didn't realize you had been working on GLM. If that is the case (which I think is wise) then feel free to put a setOptimizer in GLM, I'll leave it in my NeuralNetworks, and lets just try to have some consistency in the APIs... specifically- setOptimizer is a method that takes... an optimizer. We can default to whatever is most appropriate for each learning algorithm.
Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Tue, Mar 29, 2016 at 3:26 PM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > > Adding a setOptimizer to IterativeSolver. > > Do you mean MLR here? IterativeSolver is implemented by different solvers, > I don't think adding a method like this makes sense there. > > In the case of MLR a better alternative that includes a bit more work is to > create a Generalized Linear Model framework that provides > implementations for the most common linear models (ridge, lasso etc.) I had > already started work on this here > <https://github.com/thvasilo/flink/commits/glm>, but never got around > to opening a PR. The relevant JIRA is here > <https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer > method in GeneralizedLinearModel (with some restrictions/warnings > regarding choice of optimizer and regularization) would be the preferred > option for me at least. > > Other than that the list looks fine :) > > On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > > OK, I'm trying to respond to you and Till in one thread so someone call > me > > out if I missed a point but here goes: > > > > SGD Predicting Vectors : There was discussion in the past regarding > this- > > at the time it was decided to go with only Doubles for simplicity. I feel > > strongly that there is cause now for predicting vectors. This should be > a > > separate PR. I'll open an issue, we can refer to earlier mailing list > and > > reopen discussion on best way to proceed > > > > Warm Starts : Basically all that needs to be done here is for the > iterative > > solver to keep track of what iteration it is on, and start from that > > iteration is WarmStart == True, then go another N iterations. I don't > > think savepoints solves this because of the way stepsizes are calculated > in > > SGD, though I don't know enough about savepoints to say for sure. As > Till > > said, and I agree, very simple fix. Use cases: Testing how new features > > (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in > 1000 > > data point bursts and measure the error, see how it decreases as time > goes > > on. Also, model updates. E.g. I have a huge model that gets trained on a > > year of data and takes a day or two to do so, but after that I just want > to > > update it nightly with the data from the last 24 hours, or at the > extreme- > > online learning, e.g. every new data point updates the model. > > > > Model Grading Metrics: I'll chime in on the PR you mentioned. > > > > Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies > > into vectors it best done inside of methods that need such functionality > > seems to be the concensus. I'm ok with that, as I have such things > working > > rather elegantly, but wanted to throw it out there anyway. > > > > BLAS ops for matrices: I'll take care of this in my code. > > > > adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred > to > > Till, Till said open a PR. I'll make the default SimpleSGD to maintain > > backwards compatibility > > > > New issues to create: > > [ ] Optimizer to predict vectors or Doubles and maintain backwards > > compatibility. > > [ ] Warm Start Functionality > > [ ] setOptimizer to Iterative Solver, with default to SimpleSGD. > > [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first > > iteration, other flavors to follow). > > > > Let me know if I missed anything. I'm guessing you guys are done for the > > day so I'll wait until tomorrow night my time (Chicago) before a I move > > ahead on anything, to give you a chance to respond. > > > > Thanks! > > tg > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis < > > theodoros.vasilou...@gmail.com> wrote: > > > > > Hello Trevor, > > > > > > These are indeed a lot of issues, let's see if we can fit the > discussion > > > for all of them > > > in one thread. > > > > > > I'll add some comments inline. > > > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > > > > > > > We have discussed this in the past and at that point decided that it > > didn't > > > make > > > sense to change the base SGD implementation to accommodate vectors. > > > The alternatives that were presented at the time were to abstract away > > > the type of the input/output in the Optimizer (allowing for both > Vectors > > > and Doubles), > > > or to create specialized classes for each case. That also gives us > > greater > > > flexibility > > > in terms of optimizing performance. > > > > > > In terms of the ANN, I think you can hide away the Vectors in the > > > implementation of the ANN > > > model, and use the Optimizer interface as is, like A. Ulanov did with > the > > > Spark > > > ANN > > > <https://github.com/apache/spark/pull/7621/files> > > > implementation <https://github.com/apache/spark/pull/7621/files>. > > > > > > - Allow for 'warm starts' > > > > > > > > > I like the idea of having a partiFit-like function, could you present a > > > couple > > > of use cases where we might use it? I'm wondering if savepoints already > > > cover > > > this functionality. > > > > > > - A library of model grading metrics. > > > > > > > > > > We have a (perpetually) open PR < > > https://github.com/apache/flink/pull/871> > > > for an evaluation framework. Could you > > > expand on "Having 'calculate RSquare' as a built in method for every > > > regressor > > > doesn't seem like an efficient way to do this long term." > > > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > > > > This will be a good addition. If they are specific to the ANN > > > implementation > > > however I would hide them away from the rest of the code (and include > in > > > that PR > > > only) until another usecase comes up. > > > > > > - A neural net has Arrays of matrices of weights (instead of just a > > > vector). > > > > > > > > > > Yes this is probably not the most efficient way to do this, but it's > the > > > "least > > > API breaking" I'm afraid. > > > > > > - The linear regression implementation currently presumes it will be > > using > > > > SGD but I think that should be 'settable' as a parameter > > > > > > > > > > The original Optimizer was written the way you described, but we > changed > > it > > > later IIRC to make it more accessible (e.g. for users that don't know > > that > > > you can't match L1 regularization with L-BFGS). Maybe Till can say more > > > about the other reasons this was changed. > > > > > > > > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant < > trevor.d.gr...@gmail.com> > > > wrote: > > > > > > > Hey, > > > > > > > > I have a working prototype of an multi layer perceptron > implementation > > > > working in Flink. > > > > > > > > I made every possible effort to utilize existing code when possible. > > > > > > > > In the process of doing this there were some hacks I want/need, and > > think > > > > this should be broken up into multiple PRs and possible abstract out > > the > > > > whole thing because the MLP implementation I came up with is itself > > > > designed to be extendable to Long Short Term Memory Networks. > > > > > > > > Top level here are some of the sub PRs > > > > > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > This > > > > allows the same NN code (and other algos) to be used for > > classification, > > > > transformations, and regressions. > > > > > > > > - Allow for 'warm starts' -> this requires adding a parameter to > > > > IterativeSolver that basically starts on iteration N. This is > somewhat > > > > akin to the idea of partial fits in sklearn OR making the iterative > > > solver > > > > have some sort of internal counter and then when you call 'fit' it > just > > > > runs another N iterations (which is set by SetIterations) instead of > > > > assuming it is back to zero. This might seem trivial but has > > significant > > > > impact on step size calculations. > > > > > > > > - A library of model grading metrics. Having 'calculate RSquare' as a > > > built > > > > in method for every regressor doesn't seem like an efficient way to > do > > > this > > > > long term. > > > > > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > > > - A neural net has Arrays of matrices of weights (instead of just a > > > > vector). Currently I flatten the array of matrices out into a weight > > > > vector and reassemble it into an array of matrices, though this is > > > probably > > > > not super effecient. > > > > > > > > - The linear regression implementation currently presumes it will be > > > using > > > > SGD but I think that should be 'settable' as a parameter, because if > > not- > > > > why do we have all of those other nice SGD methods just hanging out? > > > > Similarly the loss function / partial loss is hard coded. I > reccomend > > > > making the current setup the 'defaults' of a 'setOptimizer' method. > > I.e. > > > > if you want to just run a MLR you can do it based on the examples, > but > > if > > > > you want to use a fancy optimizer you can create it from existing > > > methods, > > > > or make your own, then call something like `mlr.setOptimizer( > > myOptimizer > > > > )` > > > > > > > > - and more > > > > > > > > At any rate- if some people could weigh in / direct me how to proceed > > > that > > > > would be swell. > > > > > > > > Thanks! > > > > tg > > > > > > > > > > > > > > > > > > > > Trevor Grant > > > > Data Scientist > > > > https://github.com/rawkintrevo > > > > http://stackexchange.com/users/3002022/rawkintrevo > > > > http://trevorgrant.org > > > > > > > > *"Fortunate is he, who is able to know the causes of things." > -Virgil* > > > > > > > > > >