Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-23 Thread Josh Devins
Thanks for the pointer Peter, that change will indeed fix this bug and it looks like it will make it into the upcoming 1.3.0 release. @Evan, for reference, completeness and posterity: > Just to be clear - you're currently calling .persist() before you pass data > to LogisticRegressionWithLBFGS?

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Peter Rudenko
It's fixed today: https://github.com/apache/spark/pull/4593 Thanks, Peter Rudenko On 2015-02-17 18:25, Evan R. Sparks wrote: Josh - thanks for the detailed write up - this seems a little funny to me. I agree that with the current code path there is extra work being done than needs to be (e.g. th

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Evan R. Sparks
Josh - thanks for the detailed write up - this seems a little funny to me. I agree that with the current code path there is extra work being done than needs to be (e.g. the features are re-scaled at every iteration, but the relatively costly process of fitting the StandardScaler should not be re-do