> Adding a setOptimizer to IterativeSolver. Do you mean MLR here? IterativeSolver is implemented by different solvers, I don't think adding a method like this makes sense there.
In the case of MLR a better alternative that includes a bit more work is to create a Generalized Linear Model framework that provides implementations for the most common linear models (ridge, lasso etc.) I had already started work on this here <https://github.com/thvasilo/flink/commits/glm>, but never got around to opening a PR. The relevant JIRA is here <https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer method in GeneralizedLinearModel (with some restrictions/warnings regarding choice of optimizer and regularization) would be the preferred option for me at least. Other than that the list looks fine :) On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <trevor.d.gr...@gmail.com> wrote: > OK, I'm trying to respond to you and Till in one thread so someone call me > out if I missed a point but here goes: > > SGD Predicting Vectors : There was discussion in the past regarding this- > at the time it was decided to go with only Doubles for simplicity. I feel > strongly that there is cause now for predicting vectors. This should be a > separate PR. I'll open an issue, we can refer to earlier mailing list and > reopen discussion on best way to proceed > > Warm Starts : Basically all that needs to be done here is for the iterative > solver to keep track of what iteration it is on, and start from that > iteration is WarmStart == True, then go another N iterations. I don't > think savepoints solves this because of the way stepsizes are calculated in > SGD, though I don't know enough about savepoints to say for sure. As Till > said, and I agree, very simple fix. Use cases: Testing how new features > (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000 > data point bursts and measure the error, see how it decreases as time goes > on. Also, model updates. E.g. I have a huge model that gets trained on a > year of data and takes a day or two to do so, but after that I just want to > update it nightly with the data from the last 24 hours, or at the extreme- > online learning, e.g. every new data point updates the model. > > Model Grading Metrics: I'll chime in on the PR you mentioned. > > Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies > into vectors it best done inside of methods that need such functionality > seems to be the concensus. I'm ok with that, as I have such things working > rather elegantly, but wanted to throw it out there anyway. > > BLAS ops for matrices: I'll take care of this in my code. > > adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to > Till, Till said open a PR. I'll make the default SimpleSGD to maintain > backwards compatibility > > New issues to create: > [ ] Optimizer to predict vectors or Doubles and maintain backwards > compatibility. > [ ] Warm Start Functionality > [ ] setOptimizer to Iterative Solver, with default to SimpleSGD. > [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first > iteration, other flavors to follow). > > Let me know if I missed anything. I'm guessing you guys are done for the > day so I'll wait until tomorrow night my time (Chicago) before a I move > ahead on anything, to give you a chance to respond. > > Thanks! > tg > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis < > theodoros.vasilou...@gmail.com> wrote: > > > Hello Trevor, > > > > These are indeed a lot of issues, let's see if we can fit the discussion > > for all of them > > in one thread. > > > > I'll add some comments inline. > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > > > > We have discussed this in the past and at that point decided that it > didn't > > make > > sense to change the base SGD implementation to accommodate vectors. > > The alternatives that were presented at the time were to abstract away > > the type of the input/output in the Optimizer (allowing for both Vectors > > and Doubles), > > or to create specialized classes for each case. That also gives us > greater > > flexibility > > in terms of optimizing performance. > > > > In terms of the ANN, I think you can hide away the Vectors in the > > implementation of the ANN > > model, and use the Optimizer interface as is, like A. Ulanov did with the > > Spark > > ANN > > <https://github.com/apache/spark/pull/7621/files> > > implementation <https://github.com/apache/spark/pull/7621/files>. > > > > - Allow for 'warm starts' > > > > > > I like the idea of having a partiFit-like function, could you present a > > couple > > of use cases where we might use it? I'm wondering if savepoints already > > cover > > this functionality. > > > > - A library of model grading metrics. > > > > > > > We have a (perpetually) open PR < > https://github.com/apache/flink/pull/871> > > for an evaluation framework. Could you > > expand on "Having 'calculate RSquare' as a built in method for every > > regressor > > doesn't seem like an efficient way to do this long term." > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > This will be a good addition. If they are specific to the ANN > > implementation > > however I would hide them away from the rest of the code (and include in > > that PR > > only) until another usecase comes up. > > > > - A neural net has Arrays of matrices of weights (instead of just a > > vector). > > > > > > > Yes this is probably not the most efficient way to do this, but it's the > > "least > > API breaking" I'm afraid. > > > > - The linear regression implementation currently presumes it will be > using > > > SGD but I think that should be 'settable' as a parameter > > > > > > > The original Optimizer was written the way you described, but we changed > it > > later IIRC to make it more accessible (e.g. for users that don't know > that > > you can't match L1 regularization with L-BFGS). Maybe Till can say more > > about the other reasons this was changed. > > > > > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.gr...@gmail.com> > > wrote: > > > > > Hey, > > > > > > I have a working prototype of an multi layer perceptron implementation > > > working in Flink. > > > > > > I made every possible effort to utilize existing code when possible. > > > > > > In the process of doing this there were some hacks I want/need, and > think > > > this should be broken up into multiple PRs and possible abstract out > the > > > whole thing because the MLP implementation I came up with is itself > > > designed to be extendable to Long Short Term Memory Networks. > > > > > > Top level here are some of the sub PRs > > > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > This > > > allows the same NN code (and other algos) to be used for > classification, > > > transformations, and regressions. > > > > > > - Allow for 'warm starts' -> this requires adding a parameter to > > > IterativeSolver that basically starts on iteration N. This is somewhat > > > akin to the idea of partial fits in sklearn OR making the iterative > > solver > > > have some sort of internal counter and then when you call 'fit' it just > > > runs another N iterations (which is set by SetIterations) instead of > > > assuming it is back to zero. This might seem trivial but has > significant > > > impact on step size calculations. > > > > > > - A library of model grading metrics. Having 'calculate RSquare' as a > > built > > > in method for every regressor doesn't seem like an efficient way to do > > this > > > long term. > > > > > > -BLAS for matrix ops (this was talked about earlier) > > > > > > - A neural net has Arrays of matrices of weights (instead of just a > > > vector). Currently I flatten the array of matrices out into a weight > > > vector and reassemble it into an array of matrices, though this is > > probably > > > not super effecient. > > > > > > - The linear regression implementation currently presumes it will be > > using > > > SGD but I think that should be 'settable' as a parameter, because if > not- > > > why do we have all of those other nice SGD methods just hanging out? > > > Similarly the loss function / partial loss is hard coded. I reccomend > > > making the current setup the 'defaults' of a 'setOptimizer' method. > I.e. > > > if you want to just run a MLR you can do it based on the examples, but > if > > > you want to use a fancy optimizer you can create it from existing > > methods, > > > or make your own, then call something like `mlr.setOptimizer( > myOptimizer > > > )` > > > > > > - and more > > > > > > At any rate- if some people could weigh in / direct me how to proceed > > that > > > would be swell. > > > > > > Thanks! > > > tg > > > > > > > > > > > > > > > Trevor Grant > > > Data Scientist > > > https://github.com/rawkintrevo > > > http://stackexchange.com/users/3002022/rawkintrevo > > > http://trevorgrant.org > > > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > >