OK, I'm trying to respond to you and Till in one thread so someone call me out if I missed a point but here goes:
SGD Predicting Vectors : There was discussion in the past regarding this- at the time it was decided to go with only Doubles for simplicity. I feel strongly that there is cause now for predicting vectors. This should be a separate PR. I'll open an issue, we can refer to earlier mailing list and reopen discussion on best way to proceed Warm Starts : Basically all that needs to be done here is for the iterative solver to keep track of what iteration it is on, and start from that iteration is WarmStart == True, then go another N iterations. I don't think savepoints solves this because of the way stepsizes are calculated in SGD, though I don't know enough about savepoints to say for sure. As Till said, and I agree, very simple fix. Use cases: Testing how new features (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000 data point bursts and measure the error, see how it decreases as time goes on. Also, model updates. E.g. I have a huge model that gets trained on a year of data and takes a day or two to do so, but after that I just want to update it nightly with the data from the last 24 hours, or at the extreme- online learning, e.g. every new data point updates the model. Model Grading Metrics: I'll chime in on the PR you mentioned. Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies into vectors it best done inside of methods that need such functionality seems to be the concensus. I'm ok with that, as I have such things working rather elegantly, but wanted to throw it out there anyway. BLAS ops for matrices: I'll take care of this in my code. adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to Till, Till said open a PR. I'll make the default SimpleSGD to maintain backwards compatibility New issues to create: [ ] Optimizer to predict vectors or Doubles and maintain backwards compatibility. [ ] Warm Start Functionality [ ] setOptimizer to Iterative Solver, with default to SimpleSGD. [ ] Add neuralnets package to FlinkML (Multilayer perceptron is first iteration, other flavors to follow). Let me know if I missed anything. I'm guessing you guys are done for the day so I'll wait until tomorrow night my time (Chicago) before a I move ahead on anything, to give you a chance to respond. Thanks! tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > Hello Trevor, > > These are indeed a lot of issues, let's see if we can fit the discussion > for all of them > in one thread. > > I'll add some comments inline. > > - Expand SGD to allow for predicting vectors instead of just Doubles. > > > We have discussed this in the past and at that point decided that it didn't > make > sense to change the base SGD implementation to accommodate vectors. > The alternatives that were presented at the time were to abstract away > the type of the input/output in the Optimizer (allowing for both Vectors > and Doubles), > or to create specialized classes for each case. That also gives us greater > flexibility > in terms of optimizing performance. > > In terms of the ANN, I think you can hide away the Vectors in the > implementation of the ANN > model, and use the Optimizer interface as is, like A. Ulanov did with the > Spark > ANN > <https://github.com/apache/spark/pull/7621/files> > implementation <https://github.com/apache/spark/pull/7621/files>. > > - Allow for 'warm starts' > > > I like the idea of having a partiFit-like function, could you present a > couple > of use cases where we might use it? I'm wondering if savepoints already > cover > this functionality. > > - A library of model grading metrics. > > > > We have a (perpetually) open PR <https://github.com/apache/flink/pull/871> > for an evaluation framework. Could you > expand on "Having 'calculate RSquare' as a built in method for every > regressor > doesn't seem like an efficient way to do this long term." > > -BLAS for matrix ops (this was talked about earlier) > > > This will be a good addition. If they are specific to the ANN > implementation > however I would hide them away from the rest of the code (and include in > that PR > only) until another usecase comes up. > > - A neural net has Arrays of matrices of weights (instead of just a > vector). > > > > Yes this is probably not the most efficient way to do this, but it's the > "least > API breaking" I'm afraid. > > - The linear regression implementation currently presumes it will be using > > SGD but I think that should be 'settable' as a parameter > > > > The original Optimizer was written the way you described, but we changed it > later IIRC to make it more accessible (e.g. for users that don't know that > you can't match L1 regularization with L-BFGS). Maybe Till can say more > about the other reasons this was changed. > > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > > Hey, > > > > I have a working prototype of an multi layer perceptron implementation > > working in Flink. > > > > I made every possible effort to utilize existing code when possible. > > > > In the process of doing this there were some hacks I want/need, and think > > this should be broken up into multiple PRs and possible abstract out the > > whole thing because the MLP implementation I came up with is itself > > designed to be extendable to Long Short Term Memory Networks. > > > > Top level here are some of the sub PRs > > > > - Expand SGD to allow for predicting vectors instead of just Doubles. > This > > allows the same NN code (and other algos) to be used for classification, > > transformations, and regressions. > > > > - Allow for 'warm starts' -> this requires adding a parameter to > > IterativeSolver that basically starts on iteration N. This is somewhat > > akin to the idea of partial fits in sklearn OR making the iterative > solver > > have some sort of internal counter and then when you call 'fit' it just > > runs another N iterations (which is set by SetIterations) instead of > > assuming it is back to zero. This might seem trivial but has significant > > impact on step size calculations. > > > > - A library of model grading metrics. Having 'calculate RSquare' as a > built > > in method for every regressor doesn't seem like an efficient way to do > this > > long term. > > > > -BLAS for matrix ops (this was talked about earlier) > > > > - A neural net has Arrays of matrices of weights (instead of just a > > vector). Currently I flatten the array of matrices out into a weight > > vector and reassemble it into an array of matrices, though this is > probably > > not super effecient. > > > > - The linear regression implementation currently presumes it will be > using > > SGD but I think that should be 'settable' as a parameter, because if not- > > why do we have all of those other nice SGD methods just hanging out? > > Similarly the loss function / partial loss is hard coded. I reccomend > > making the current setup the 'defaults' of a 'setOptimizer' method. I.e. > > if you want to just run a MLR you can do it based on the examples, but if > > you want to use a fancy optimizer you can create it from existing > methods, > > or make your own, then call something like `mlr.setOptimizer( myOptimizer > > )` > > > > - and more > > > > At any rate- if some people could weigh in / direct me how to proceed > that > > would be swell. > > > > Thanks! > > tg > > > > > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > >