> Adding a setOptimizer to IterativeSolver.

Do you mean MLR here? IterativeSolver is implemented by different solvers,
I don't think adding a method like this makes sense there.

In the case of MLR a better alternative that includes a bit more work is to
create a Generalized Linear Model framework that provides
implementations for the most common linear models (ridge, lasso etc.) I had
already started work on this here
<https://github.com/thvasilo/flink/commits/glm>, but never got around
to opening a PR. The relevant JIRA is here
<https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer
method in GeneralizedLinearModel (with some restrictions/warnings
regarding choice of optimizer and regularization) would be the preferred
option for me at least.

Other than that the list looks fine :)

On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

> OK, I'm trying to respond to you and Till in one thread so someone call me
> out if I missed a point but here goes:
>
> SGD Predicting Vectors :  There was discussion in the past regarding this-
> at the time it was decided to go with only Doubles for simplicity. I feel
> strongly that there is cause now for predicting vectors.  This should be a
> separate PR.  I'll open an issue, we can refer to earlier mailing list and
> reopen discussion on best way to proceed
>
> Warm Starts : Basically all that needs to be done here is for the iterative
> solver to keep track of what iteration it is on, and start from that
> iteration is WarmStart == True, then go another N iterations.  I don't
> think savepoints solves this because of the way stepsizes are calculated in
> SGD, though I don't know enough about savepoints to say for sure.  As Till
> said, and I agree, very simple fix.  Use cases: Testing how new features
> (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000
> data point bursts and measure the error, see how it decreases as time goes
> on. Also, model updates. E.g. I have a huge model that gets trained on a
> year of data and takes a day or two to do so, but after that I just want to
> update it nightly with the data from the last 24 hours, or at the extreme-
> online learning, e.g. every new data point updates the model.
>
> Model Grading Metrics:  I'll chime in on the PR you mentioned.
>
> Weight Arrays vs. Weight Vectors:  Winding/unwinding arrays of matricies
> into vectors it best done inside of methods that need such functionality
> seems to be the concensus. I'm ok with that, as I have such things working
> rather elegantly, but wanted to throw it out there anyway.
>
> BLAS ops for matrices:  I'll take care of this in my code.
>
> adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to
> Till, Till said open a PR.  I'll make the default SimpleSGD to maintain
> backwards compatibility
>
> New issues to create:
> [  ] Optimizer to predict vectors or Doubles and maintain backwards
> compatibility.
> [  ] Warm Start Functionality
> [  ] setOptimizer to Iterative Solver, with default to SimpleSGD.
> [  ] Add neuralnets package to FlinkML (Multilayer perceptron is first
> iteration, other flavors to follow).
>
> Let me know if I missed anything.  I'm guessing you guys are done for the
> day so I'll wait until tomorrow night my time (Chicago) before a I move
> ahead on anything, to give you a chance to respond.
>
> Thanks!
> tg
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
> theodoros.vasilou...@gmail.com> wrote:
>
> > Hello Trevor,
> >
> > These are indeed a lot of issues, let's see if we can fit the discussion
> > for all of them
> > in one thread.
> >
> > I'll add some comments inline.
> >
> > - Expand SGD to allow for predicting vectors instead of just Doubles.
> >
> >
> > We have discussed this in the past and at that point decided that it
> didn't
> > make
> > sense to change the base SGD implementation to accommodate vectors.
> > The alternatives that were presented at the time were to abstract away
> > the type of the input/output in the Optimizer (allowing for both Vectors
> > and Doubles),
> > or to create specialized classes for each case. That also gives us
> greater
> > flexibility
> > in terms of optimizing performance.
> >
> > In terms of the ANN, I think you can hide away the Vectors in the
> > implementation of the ANN
> > model, and use the Optimizer interface as is, like A. Ulanov did with the
> > Spark
> > ANN
> > <https://github.com/apache/spark/pull/7621/files>
> > implementation <https://github.com/apache/spark/pull/7621/files>.
> >
> > - Allow for 'warm starts'
> >
> >
> > I like the idea of having a partiFit-like function, could you present a
> > couple
> > of use cases where we might use it? I'm wondering if savepoints already
> > cover
> > this functionality.
> >
> > - A library of model grading metrics.
> > >
> >
> > We have a (perpetually) open PR <
> https://github.com/apache/flink/pull/871>
> > for an evaluation framework. Could you
> > expand on "Having 'calculate RSquare' as a built in method for every
> > regressor
> > doesn't seem like an efficient way to do this long term."
> >
> > -BLAS for matrix ops (this was talked about earlier)
> >
> >
> > This will be a good addition. If they are specific to the ANN
> > implementation
> > however I would hide them away from the rest of the code (and include in
> > that PR
> > only) until another usecase comes up.
> >
> > - A neural net has Arrays of matrices of weights (instead of just a
> > vector).
> > >
> >
> > Yes this is probably not the most efficient way to do this, but it's the
> > "least
> > API breaking" I'm afraid.
> >
> > - The linear regression implementation currently presumes it will be
> using
> > > SGD but I think that should be 'settable' as a parameter
> > >
> >
> > The original Optimizer was written the way you described, but we changed
> it
> > later IIRC to make it more accessible (e.g. for users that don't know
> that
> > you can't match L1 regularization with L-BFGS). Maybe Till can say more
> > about the other reasons this was changed.
> >
> >
> > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.gr...@gmail.com>
> > wrote:
> >
> > > Hey,
> > >
> > > I have a working prototype of an multi layer perceptron implementation
> > > working in Flink.
> > >
> > > I made every possible effort to utilize existing code when possible.
> > >
> > > In the process of doing this there were some hacks I want/need, and
> think
> > > this should be broken up into multiple PRs and possible abstract out
> the
> > > whole thing because the MLP implementation I came up with is itself
> > > designed to be extendable to Long Short Term Memory Networks.
> > >
> > > Top level here are some of the sub PRs
> > >
> > > - Expand SGD to allow for predicting vectors instead of just Doubles.
> > This
> > > allows the same NN code (and other algos) to be used for
> classification,
> > > transformations, and regressions.
> > >
> > > - Allow for 'warm starts' -> this requires adding a parameter to
> > > IterativeSolver that basically starts on iteration N.  This is somewhat
> > > akin to the idea of partial fits in sklearn OR making the iterative
> > solver
> > > have some sort of internal counter and then when you call 'fit' it just
> > > runs another N iterations (which is set by SetIterations) instead of
> > > assuming it is back to zero.  This might seem trivial but has
> significant
> > > impact on step size calculations.
> > >
> > > - A library of model grading metrics. Having 'calculate RSquare' as a
> > built
> > > in method for every regressor doesn't seem like an efficient way to do
> > this
> > > long term.
> > >
> > > -BLAS for matrix ops (this was talked about earlier)
> > >
> > > - A neural net has Arrays of matrices of weights (instead of just a
> > > vector).  Currently I flatten the array of matrices out into a weight
> > > vector and reassemble it into an array of matrices, though this is
> > probably
> > > not super effecient.
> > >
> > > - The linear regression implementation currently presumes it will be
> > using
> > > SGD but I think that should be 'settable' as a parameter, because if
> not-
> > > why do we have all of those other nice SGD methods just hanging out?
> > > Similarly the loss function / partial loss is hard coded.  I reccomend
> > > making the current setup the 'defaults' of a 'setOptimizer' method.
> I.e.
> > > if you want to just run a MLR you can do it based on the examples, but
> if
> > > you want to use a fancy optimizer you can create it from existing
> > methods,
> > > or make your own, then call something like `mlr.setOptimizer(
> myOptimizer
> > > )`
> > >
> > > - and more
> > >
> > > At any rate- if some people could weigh in / direct me how to proceed
> > that
> > > would be swell.
> > >
> > > Thanks!
> > > tg
> > >
> > >
> > >
> > >
> > > Trevor Grant
> > > Data Scientist
> > > https://github.com/rawkintrevo
> > > http://stackexchange.com/users/3002022/rawkintrevo
> > > http://trevorgrant.org
> > >
> > > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> > >
> >
>

Reply via email to