OK, I'm trying to respond to you and Till in one thread so someone call me
out if I missed a point but here goes:

SGD Predicting Vectors :  There was discussion in the past regarding this-
at the time it was decided to go with only Doubles for simplicity. I feel
strongly that there is cause now for predicting vectors.  This should be a
separate PR.  I'll open an issue, we can refer to earlier mailing list and
reopen discussion on best way to proceed

Warm Starts : Basically all that needs to be done here is for the iterative
solver to keep track of what iteration it is on, and start from that
iteration is WarmStart == True, then go another N iterations.  I don't
think savepoints solves this because of the way stepsizes are calculated in
SGD, though I don't know enough about savepoints to say for sure.  As Till
said, and I agree, very simple fix.  Use cases: Testing how new features
(e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000
data point bursts and measure the error, see how it decreases as time goes
on. Also, model updates. E.g. I have a huge model that gets trained on a
year of data and takes a day or two to do so, but after that I just want to
update it nightly with the data from the last 24 hours, or at the extreme-
online learning, e.g. every new data point updates the model.

Model Grading Metrics:  I'll chime in on the PR you mentioned.

Weight Arrays vs. Weight Vectors:  Winding/unwinding arrays of matricies
into vectors it best done inside of methods that need such functionality
seems to be the concensus. I'm ok with that, as I have such things working
rather elegantly, but wanted to throw it out there anyway.

BLAS ops for matrices:  I'll take care of this in my code.

adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to
Till, Till said open a PR.  I'll make the default SimpleSGD to maintain
backwards compatibility

New issues to create:
[  ] Optimizer to predict vectors or Doubles and maintain backwards
compatibility.
[  ] Warm Start Functionality
[  ] setOptimizer to Iterative Solver, with default to SimpleSGD.
[  ] Add neuralnets package to FlinkML (Multilayer perceptron is first
iteration, other flavors to follow).

Let me know if I missed anything.  I'm guessing you guys are done for the
day so I'll wait until tomorrow night my time (Chicago) before a I move
ahead on anything, to give you a chance to respond.

Thanks!
tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
theodoros.vasilou...@gmail.com> wrote:

> Hello Trevor,
>
> These are indeed a lot of issues, let's see if we can fit the discussion
> for all of them
> in one thread.
>
> I'll add some comments inline.
>
> - Expand SGD to allow for predicting vectors instead of just Doubles.
>
>
> We have discussed this in the past and at that point decided that it didn't
> make
> sense to change the base SGD implementation to accommodate vectors.
> The alternatives that were presented at the time were to abstract away
> the type of the input/output in the Optimizer (allowing for both Vectors
> and Doubles),
> or to create specialized classes for each case. That also gives us greater
> flexibility
> in terms of optimizing performance.
>
> In terms of the ANN, I think you can hide away the Vectors in the
> implementation of the ANN
> model, and use the Optimizer interface as is, like A. Ulanov did with the
> Spark
> ANN
> <https://github.com/apache/spark/pull/7621/files>
> implementation <https://github.com/apache/spark/pull/7621/files>.
>
> - Allow for 'warm starts'
>
>
> I like the idea of having a partiFit-like function, could you present a
> couple
> of use cases where we might use it? I'm wondering if savepoints already
> cover
> this functionality.
>
> - A library of model grading metrics.
> >
>
> We have a (perpetually) open PR <https://github.com/apache/flink/pull/871>
> for an evaluation framework. Could you
> expand on "Having 'calculate RSquare' as a built in method for every
> regressor
> doesn't seem like an efficient way to do this long term."
>
> -BLAS for matrix ops (this was talked about earlier)
>
>
> This will be a good addition. If they are specific to the ANN
> implementation
> however I would hide them away from the rest of the code (and include in
> that PR
> only) until another usecase comes up.
>
> - A neural net has Arrays of matrices of weights (instead of just a
> vector).
> >
>
> Yes this is probably not the most efficient way to do this, but it's the
> "least
> API breaking" I'm afraid.
>
> - The linear regression implementation currently presumes it will be using
> > SGD but I think that should be 'settable' as a parameter
> >
>
> The original Optimizer was written the way you described, but we changed it
> later IIRC to make it more accessible (e.g. for users that don't know that
> you can't match L1 regularization with L-BFGS). Maybe Till can say more
> about the other reasons this was changed.
>
>
> On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
>
> > Hey,
> >
> > I have a working prototype of an multi layer perceptron implementation
> > working in Flink.
> >
> > I made every possible effort to utilize existing code when possible.
> >
> > In the process of doing this there were some hacks I want/need, and think
> > this should be broken up into multiple PRs and possible abstract out the
> > whole thing because the MLP implementation I came up with is itself
> > designed to be extendable to Long Short Term Memory Networks.
> >
> > Top level here are some of the sub PRs
> >
> > - Expand SGD to allow for predicting vectors instead of just Doubles.
> This
> > allows the same NN code (and other algos) to be used for classification,
> > transformations, and regressions.
> >
> > - Allow for 'warm starts' -> this requires adding a parameter to
> > IterativeSolver that basically starts on iteration N.  This is somewhat
> > akin to the idea of partial fits in sklearn OR making the iterative
> solver
> > have some sort of internal counter and then when you call 'fit' it just
> > runs another N iterations (which is set by SetIterations) instead of
> > assuming it is back to zero.  This might seem trivial but has significant
> > impact on step size calculations.
> >
> > - A library of model grading metrics. Having 'calculate RSquare' as a
> built
> > in method for every regressor doesn't seem like an efficient way to do
> this
> > long term.
> >
> > -BLAS for matrix ops (this was talked about earlier)
> >
> > - A neural net has Arrays of matrices of weights (instead of just a
> > vector).  Currently I flatten the array of matrices out into a weight
> > vector and reassemble it into an array of matrices, though this is
> probably
> > not super effecient.
> >
> > - The linear regression implementation currently presumes it will be
> using
> > SGD but I think that should be 'settable' as a parameter, because if not-
> > why do we have all of those other nice SGD methods just hanging out?
> > Similarly the loss function / partial loss is hard coded.  I reccomend
> > making the current setup the 'defaults' of a 'setOptimizer' method.  I.e.
> > if you want to just run a MLR you can do it based on the examples, but if
> > you want to use a fancy optimizer you can create it from existing
> methods,
> > or make your own, then call something like `mlr.setOptimizer( myOptimizer
> > )`
> >
> > - and more
> >
> > At any rate- if some people could weigh in / direct me how to proceed
> that
> > would be swell.
> >
> > Thanks!
> > tg
> >
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
>

Reply via email to