Hello Trevor, These are indeed a lot of issues, let's see if we can fit the discussion for all of them in one thread.
I'll add some comments inline. - Expand SGD to allow for predicting vectors instead of just Doubles. We have discussed this in the past and at that point decided that it didn't make sense to change the base SGD implementation to accommodate vectors. The alternatives that were presented at the time were to abstract away the type of the input/output in the Optimizer (allowing for both Vectors and Doubles), or to create specialized classes for each case. That also gives us greater flexibility in terms of optimizing performance. In terms of the ANN, I think you can hide away the Vectors in the implementation of the ANN model, and use the Optimizer interface as is, like A. Ulanov did with the Spark ANN <https://github.com/apache/spark/pull/7621/files> implementation <https://github.com/apache/spark/pull/7621/files>. - Allow for 'warm starts' I like the idea of having a partiFit-like function, could you present a couple of use cases where we might use it? I'm wondering if savepoints already cover this functionality. - A library of model grading metrics. > We have a (perpetually) open PR <https://github.com/apache/flink/pull/871> for an evaluation framework. Could you expand on "Having 'calculate RSquare' as a built in method for every regressor doesn't seem like an efficient way to do this long term." -BLAS for matrix ops (this was talked about earlier) This will be a good addition. If they are specific to the ANN implementation however I would hide them away from the rest of the code (and include in that PR only) until another usecase comes up. - A neural net has Arrays of matrices of weights (instead of just a vector). > Yes this is probably not the most efficient way to do this, but it's the "least API breaking" I'm afraid. - The linear regression implementation currently presumes it will be using > SGD but I think that should be 'settable' as a parameter > The original Optimizer was written the way you described, but we changed it later IIRC to make it more accessible (e.g. for users that don't know that you can't match L1 regularization with L-BFGS). Maybe Till can say more about the other reasons this was changed. On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.gr...@gmail.com> wrote: > Hey, > > I have a working prototype of an multi layer perceptron implementation > working in Flink. > > I made every possible effort to utilize existing code when possible. > > In the process of doing this there were some hacks I want/need, and think > this should be broken up into multiple PRs and possible abstract out the > whole thing because the MLP implementation I came up with is itself > designed to be extendable to Long Short Term Memory Networks. > > Top level here are some of the sub PRs > > - Expand SGD to allow for predicting vectors instead of just Doubles. This > allows the same NN code (and other algos) to be used for classification, > transformations, and regressions. > > - Allow for 'warm starts' -> this requires adding a parameter to > IterativeSolver that basically starts on iteration N. This is somewhat > akin to the idea of partial fits in sklearn OR making the iterative solver > have some sort of internal counter and then when you call 'fit' it just > runs another N iterations (which is set by SetIterations) instead of > assuming it is back to zero. This might seem trivial but has significant > impact on step size calculations. > > - A library of model grading metrics. Having 'calculate RSquare' as a built > in method for every regressor doesn't seem like an efficient way to do this > long term. > > -BLAS for matrix ops (this was talked about earlier) > > - A neural net has Arrays of matrices of weights (instead of just a > vector). Currently I flatten the array of matrices out into a weight > vector and reassemble it into an array of matrices, though this is probably > not super effecient. > > - The linear regression implementation currently presumes it will be using > SGD but I think that should be 'settable' as a parameter, because if not- > why do we have all of those other nice SGD methods just hanging out? > Similarly the loss function / partial loss is hard coded. I reccomend > making the current setup the 'defaults' of a 'setOptimizer' method. I.e. > if you want to just run a MLR you can do it based on the examples, but if > you want to use a fancy optimizer you can create it from existing methods, > or make your own, then call something like `mlr.setOptimizer( myOptimizer > )` > > - and more > > At any rate- if some people could weigh in / direct me how to proceed that > would be swell. > > Thanks! > tg > > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* >