Wayne Zhang created SPARK-18929:
-----------------------------------
Summary: Add Tweedie distribution in GLM
Key: SPARK-18929
URL: https://issues.apache.org/jira/browse/SPARK-18929
Project: Spark
Issue Type: New Feature
Components: ML
Affects Versions: 2.0.2
Reporter: Wayne Zhang
I propose to add the full Tweedie family into the GeneralizedLinearRegression
model. The Tweedie family is characterized by a power variance function.
Currently supported distributions such as Gaussian, Poisson and Gamma families
are a special case of the
[Tweedie|https://en.wikipedia.org/wiki/Tweedie_distribution].
I propose to add support for the other distributions:
* compound Poisson: 1 < variancePower < 2. This one is widely used to model
zero-inflated continuous distributions.
* positive stable: variancePower > 2 and variancePower != 3. Used to model
extreme values.
* inverse Gaussian: variancePower = 3.
The Tweedie family is supported in most statistical packages such as R
(statmod), SAS, h2o etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]