[ https://issues.apache.org/jira/browse/FLINK-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stavros Kontopoulos updated FLINK-5588: --------------------------------------- Description: So far ML has two scalers: min-max and the standard scaler. A third one frequently used, is the scaler to unit. We could implement a transformer for this type of scaling for different norms available to the user. Axis for scaling either features or samples (0 for columns-features 1 for samples-rows). I will make this a separate class for the Normalization procedure by using the Transformer API. Scikit-learn has also some calls available outside the Transform API, we might want add that in the future. Right now the existing scalers in Flink ML support per feature normalization by using the Transforer API. Resources [1] https://en.wikipedia.org/wiki/Feature_scaling [2] http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html [3] https://spark.apache.org/docs/2.1.0/mllib-feature-extraction.html [4] http://scikit-learn.org/stable/modules/preprocessing.html was: So far ML has two scalers: min-max and the standard scaler. A third one frequently used, is the scaler to unit. We could implement a transformer for this type of scaling for different norms available to the user. Axis for scaling either features or samples (0 for columns-features 1 for samples-rows). Right now the existing scalers support per feature normalization. I think its trivial to add per sample normalization. Resources [1] https://en.wikipedia.org/wiki/Feature_scaling [2] http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html [3] https://spark.apache.org/docs/2.1.0/mllib-feature-extraction.html > Add a unit scaler based on different norms > ------------------------------------------ > > Key: FLINK-5588 > URL: https://issues.apache.org/jira/browse/FLINK-5588 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Stavros Kontopoulos > Assignee: Stavros Kontopoulos > Priority: Minor > > So far ML has two scalers: min-max and the standard scaler. > A third one frequently used, is the scaler to unit. > We could implement a transformer for this type of scaling for different norms > available to the user. > Axis for scaling either features or samples (0 for columns-features 1 for > samples-rows). > I will make this a separate class for the Normalization procedure by using > the Transformer API. > Scikit-learn has also some calls available outside the Transform API, we > might want add that in the future. > Right now the existing scalers in Flink ML support per feature normalization > by using the Transforer API. > Resources > [1] https://en.wikipedia.org/wiki/Feature_scaling > [2] > http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html > [3] https://spark.apache.org/docs/2.1.0/mllib-feature-extraction.html > [4] http://scikit-learn.org/stable/modules/preprocessing.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)