Hey, I have a working prototype of an multi layer perceptron implementation working in Flink.
I made every possible effort to utilize existing code when possible. In the process of doing this there were some hacks I want/need, and think this should be broken up into multiple PRs and possible abstract out the whole thing because the MLP implementation I came up with is itself designed to be extendable to Long Short Term Memory Networks. Top level here are some of the sub PRs - Expand SGD to allow for predicting vectors instead of just Doubles. This allows the same NN code (and other algos) to be used for classification, transformations, and regressions. - Allow for 'warm starts' -> this requires adding a parameter to IterativeSolver that basically starts on iteration N. This is somewhat akin to the idea of partial fits in sklearn OR making the iterative solver have some sort of internal counter and then when you call 'fit' it just runs another N iterations (which is set by SetIterations) instead of assuming it is back to zero. This might seem trivial but has significant impact on step size calculations. - A library of model grading metrics. Having 'calculate RSquare' as a built in method for every regressor doesn't seem like an efficient way to do this long term. -BLAS for matrix ops (this was talked about earlier) - A neural net has Arrays of matrices of weights (instead of just a vector). Currently I flatten the array of matrices out into a weight vector and reassemble it into an array of matrices, though this is probably not super effecient. - The linear regression implementation currently presumes it will be using SGD but I think that should be 'settable' as a parameter, because if not- why do we have all of those other nice SGD methods just hanging out? Similarly the loss function / partial loss is hard coded. I reccomend making the current setup the 'defaults' of a 'setOptimizer' method. I.e. if you want to just run a MLR you can do it based on the examples, but if you want to use a fancy optimizer you can create it from existing methods, or make your own, then call something like `mlr.setOptimizer( myOptimizer )` - and more At any rate- if some people could weigh in / direct me how to proceed that would be swell. Thanks! tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil*