Hi Jeff, Actually I have one implementation of robust regression with huber loss for a long time (https://github.com/apache/spark/pull/14326). This is a fairly straightforward porting for scikit-learn HuberRegressor. The PR making huber regression as a separate Estimator, and we found it can be merged into LinearRegression. I will update this PR ASAP, and I'm looking forward your reviews and comments. After the Scala implementation is merged, it's very easy to add corresponding PySpark API, then you can use it to train huber regression model in the distributed environment.
Thanks Yanbo On Sun, Aug 20, 2017 at 3:19 PM, Jeff Gates <gatesa...@gmail.com> wrote: > Hi guys, > > Is there huber regression in PySpark? We are using sklearn HuberRegressor ( > http://scikit-learn.org/stable/modules/generated/sklearn. > linear_model.HuberRegressor.html) to train our model, but with some > bottleneck in single node. > If no, is there any obstacle to implement it in PySpark? > > Jeff >